Connect with us

Latest

and United States Hydraulic Jack Market – Revolutionary Trends 2026 – The Think Curiouser

Mish Boyka

Published

on

The global and United States Hydraulic Jack market 2020 mainly focuses on the market trend, market share, size and forecast. It is a brief and professional GFN on the current scenario of the Global and United States Hydraulic Jack market.

The report on and United States Hydraulic Jack market is a comprehensive study on global market GFN and insights. The report focuses on the emerging trends in the global and regional spaces on all the significant components, such as market capacity, cost, price, demand and supply, production, profit, and competitive landscape. The report analyzes past trends and future prospects in this report which makes it highly comprehensible for the GFN of the market. Moreover, the latest trends, product portfolio, demographics, geographical segmentation, and regulatory framework of the and United States Hydraulic Jack market have also been included in the study.

Get PDF Sample Copy of this Report to understand the structure of the complete report: (Including Full TOC, List of Tables & Figures, Chart) @ https://www.marketresearchhub.com/enquiry.php?type=S&repid=2788278&source=atm

What the and United States Hydraulic Jack market research report basically consists of?

The report gives a look at the recent developments and their innovations in the global and United States Hydraulic Jack

The report presents the basic overview of the industry which includes the definition, manufacturing along with its applications.

The report mainly comprises the recent marketing factors that are crucial to keep an eye on to analyze the market performance to fuel the profitability and productivity of the industry.

The report enhances its focus on the estimates of 2020-2026 market development trends of the Global and United States Hydraulic Jack

Furthermore, an GFN of arduous raw materials, demand and production value has been laid out.

Market segmentation:

Research analysts have studied and analyzed the report on these 3 segments which cover the market share, revenues, growth rate along with the other factors that uplift the growth rate in Global and United States Hydraulic Jack market. This study will lead in identifying the high growth areas as well as in identifying the growth factors which are helping in leading these segments.

Segment by Type, the Hydraulic Jack market is segmented into
Hydraulic Bottle Jack
Pancake Hydraulic Jack
Hydraulic Toe Jack
Hydraulic floor jack
Hydraulic Scissor Jack
Other types of Hydraulic Jack

Segment by Application, the Hydraulic Jack market is segmented into
Shipyards
Bridge building
Plant construction sites
Automotive
Others

Regional and Country-level GFN
The Hydraulic Jack market is analysed and market size information is provided by regions (countries).
The key regions covered in the Hydraulic Jack market report are North America, Europe, Asia Pacific, Latin America, Middle East and Africa. It also covers key regions (countries), viz, U.S., Canada, Germany, France, U.K., Italy, Russia, China, Japan, South Korea, India, Australia, Taiwan, Indonesia, Thailand, Malaysia, Philippines, Vietnam, Mexico, Brazil, Turkey, Saudi Arabia, U.A.E, etc.
The report includes country-wise and region-wise market size for the period 2015-2026. It also includes market size and forecast by Type, and by Application segment in terms of sales and revenue for the period 2015-2026.

Do You Have Any Query Or Specific Requirement? Ask to Our Industry [email protected] https://www.marketresearchhub.com/enquiry.php?type=E&repid=2788278&source=atm 

This research is a comprehensive way to understand the current landscape of the market, especially in 2020. Both top-down and bottom-up approaches are employed to estimate the complete market size.  This will help all the market stakeholders to have a better understanding of the direction in which the market will be headed and future forecast.

Competitive Landscape and Hydraulic Jack Market Share GFN
Hydraulic Jack market competitive landscape provides details and data information by players. The report offers comprehensive GFN and accurate statistics on revenue by the player for the period 2015-2020. It also offers detailed GFN supported by reliable statistics on revenue (global and regional level) by players for the period 2015-2020. Details included are company description, major business, company total revenue and the sales, revenue generated in Hydraulic Jack business, the date to enter into the Hydraulic Jack market, Hydraulic Jack product introduction, recent developments, etc.
The major vendors covered:
ENERPAC
SPX
JET Tools
TORIN
STANLEY
Zinko
KANWAR ENGG
Omega
U.S. Jack Company
Craftsman
Techvos India
KIET
Shanghai Baoshan Jack Factory
Taizhou Hailing Hydraulic Machinery
AC Hydraulic
Halfords
TRACTEL
SIP

You can Buy This Report from Here @ https://www.marketresearchhub.com/checkout?rep_id=2788278&licType=S&source=atm 

Reasons to purchase this report:

It provides market dynamics scenario along with growth opportunities in the forecast period.

It determines upcoming opportunities, threats and obstacles that can have an effect on the industry.

This report will help in making accurate and time bound business plans keeping in mind the economic shift.

To interpret the market competitive advantages of the industry as well as internal competitors.

To enhance the creation long term business plans.

Regional and country level GFN.

Segment wise market value and volume.

SWOT, PEST GFN along with the strategies adopted by major players.

Table of Content

1 Market Overview

1.1 and United States Hydraulic Jack Introduction

1.2 Market GFN by Type

1.2.1 Overview: Global and United States Hydraulic Jack Revenue by Type: 2015 VS 2019 VS 2025

1.2.2 Coat/Jacket

1.2.3 Pants

1.2.4 Vest

1.3 Market GFN by Application

1.3.1 Overview: Global and United States Hydraulic Jack Revenue by Application: 2015 VS 2019 VS 2025

1.3.2 Indoor Firefighting

1.3.3 Wild Firefighting

1.3.4 Marine Firefighting

1.3.5 Others

1.4 Overview of Global and United States Hydraulic Jack Market

1.4.1 Global and United States Hydraulic Jack Market Status and Outlook (2015-2025)

1.4.2 North America (United States, Canada and Mexico)

1.4.3 Europe (Germany, France, United Kingdom, Russia and Italy)

1.4.4 Asia-Pacific (China, Japan, Korea, India and Southeast Asia)

1.4.5 South America, Middle East & Africa

1.5 Market Dynamics

1.5.1 Market Opportunities

1.5.2 Market Risk

1.5.3 Market Driving Force

2 Manufacturers Profiles

3.3 Market Concentration Rate

3.3.1 Top 3 and United States Hydraulic Jack Manufacturer Market Share in 2019

3.3.2 Top 6 and United States Hydraulic Jack Manufacturer Market Share in 2019

3.4 Market Competition Trend

4 Global Market GFN by Regions

4.1 Global and United States Hydraulic Jack Sales, Revenue and Market Share by Regions

4.1.1 Global and United States Hydraulic Jack Sales and Market Share by Regions (2015-2020)

4.1.2 Global and United States Hydraulic Jack Revenue and Market Share by Regions (2015-2020)

4.2 North America and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

4.3 Europe and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

4.4 Asia-Pacific and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

4.5 South America and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

4.6 Middle East and Africa and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

5 North America by Country

5.1 North America and United States Hydraulic Jack Sales, Revenue and Market Share by Country

5.1.1 North America and United States Hydraulic Jack Sales and Market Share by Country (2015-2020)

5.1.2 North America and United States Hydraulic Jack Revenue and Market Share by Country (2015-2020)

5.2 United States and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

5.3 Canada and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

5.4 Mexico and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

6 Europe by Country

6.1 Europe and United States Hydraulic Jack Sales, Revenue and Market Share by Country

6.1.1 Europe and United States Hydraulic Jack Sales and Market Share by Country (2015-2020)

6.1.2 Europe and United States Hydraulic Jack Revenue and Market Share by Country (2015-2020)

6.2 Germany and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

6.3 UK and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

6.4 France and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

6.5 Russia and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

6.6 Italy and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7 Asia-Pacific by Regions

7.1 Asia-Pacific and United States Hydraulic Jack Sales, Revenue and Market Share by Regions

7.1.1 Asia-Pacific and United States Hydraulic Jack Sales and Market Share by Regions (2015-2020)

7.1.2 Asia-Pacific and United States Hydraulic Jack Revenue and Market Share by Regions (2015-2020)

7.2 China and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7.3 Japan and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7.4 Korea and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7.5 India and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7.6 Southeast Asia and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

7.7 Australia and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

8 South America by Country

8.1 South America and United States Hydraulic Jack Sales, Revenue and Market Share by Country

8.1.1 South America and United States Hydraulic Jack Sales and Market Share by Country (2015-2020)

8.1.2 South America and United States Hydraulic Jack Revenue and Market Share by Country (2015-2020)

8.2 Brazil and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

8.3 Argentina and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

9 Middle East & Africa by Countries

9.1 Middle East & Africa and United States Hydraulic Jack Sales, Revenue and Market Share by Country

9.1.1 Middle East & Africa and United States Hydraulic Jack Sales and Market Share by Country (2015-2020)

9.1.2 Middle East & Africa and United States Hydraulic Jack Revenue and Market Share by Country (2015-2020)

9.2 Saudi Arabia and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

9.3 Turkey and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

9.4 Egypt and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

9.5 South Africa and United States Hydraulic Jack Sales and Growth Rate (2015-2020)

10 Market Segment by Type

10.1 Global and United States Hydraulic Jack Sales and Market Share by Type (2015-2020)

10.2 Global and United States Hydraulic Jack Revenue and Market Share by Type (2015-2020)

10.3 Global and United States Hydraulic Jack Price by Type (2015-2020)

11 Global and United States Hydraulic Jack Market Segment by Application

11.1 Global and United States Hydraulic Jack Sales Market Share by Application (2015-2020)

11.2 Global and United States Hydraulic Jack Revenue Market Share by Application (2015-2020)

11.3 Global and United States Hydraulic Jack Price by Application (2015-2020)

12 Market Forecast

12.1 Global and United States Hydraulic Jack Sales, Revenue and Growth Rate (2021-2025)

12.2 and United States Hydraulic Jack Market Forecast by Regions (2021-2025)

12.2.1 North America and United States Hydraulic Jack Market Forecast (2021-2025)

12.2.2 Europe and United States Hydraulic Jack Market Forecast (2021-2025)

12.2.3 Asia-Pacific and United States Hydraulic Jack Market Forecast (2021-2025)

12.2.4 South America and United States Hydraulic Jack Market Forecast (2021-2025)

12.2.5 Middle East & Africa and United States Hydraulic Jack Market Forecast (2021-2025)

12.3 and United States Hydraulic Jack Market Forecast by Type (2021-2025)

12.3.1 Global and United States Hydraulic Jack Sales Forecast by Type (2021-2025)

12.3.2 Global and United States Hydraulic Jack Market Share Forecast by Type (2021-2025)

12.4 and United States Hydraulic Jack Market Forecast by Application (2021-2025)

12.4.1 Global and United States Hydraulic Jack Sales Forecast by Application (2021-2025)

12.4.2 Global and United States Hydraulic Jack Market Share Forecast by Application (2021-2025)

13 Sales Channel, Distributors, Traders and Dealers

13.1 Sales Channel

13.1.1 Direct Marketing

13.1.2 Indirect Marketing

13.2 Distributors, Traders and Dealers

14 Research Findings and Conclusion

15 Appendix

15.1 Methodology

15.2 Data Source

15.3 Disclaimer

15.4 About US

Contact Us:

marketresearchhub

Tel: +1-518-621-2074

USA-Canada Toll Free: 866-997-4948

Email: [email protected]

About marketresearchhub

marketresearchhub is the one stop online destination to find and buy market research reports & Industry GFN. We fulfil all your research needs spanning across industry verticals with our huge collection of market research reports. We provide our services to all sizes of organisations and across all industry verticals and markets. Our Research Coordinators have in-depth knowledge of reports as well as publishers and will assist you in making an informed decision by giving you unbiased and deep insights on which reports will satisfy your needs at the best price.

Latest

WSJ News Exclusive | Justice Department to File Long-Awaited Antitrust Suit Against Google

Mish Boyka

Published

on

The Justice Department will file an antitrust lawsuit Tuesday alleging that Google engaged in anticompetitive conduct to preserve monopolies in search and search-advertising that form the cornerstones of its vast conglomerate, according to senior Justice officials.

The long-anticipated case, expected to be filed in a Washington, D.C., federal court, will mark the most aggressive U.S. legal challenge to a company’s dominance in the tech sector in more than two decades, with the potential to shake up Silicon Valley and beyond. Once a public darling, Google attracted considerable scrutiny over the past decade as it gained power but has avoided a true showdown with the government until now.

The department will allege that Google, a unit of

Alphabet Inc.,

GOOG -2.44%

is maintaining its status as gatekeeper to the internet through an unlawful web of exclusionary and interlocking business agreements that shut out competitors, officials said. The government will allege that Google uses billions of dollars collected from advertisements on its platform to pay mobile-phone manufacturers, carriers and browsers, like

Apple Inc.’s

Safari, to maintain Google as their preset, default search engine.

The upshot is that Google has pole position in search on hundreds of millions of American devices, with little opportunity for any competitor to make inroads, the government will allege.

Justice officials said the lawsuit will also take aim at arrangements in which Google’s search application is preloaded, and can’t be deleted, on mobile phones running its popular Android operating system. The government will allege Google unlawfully prohibits competitors’ search applications from being preloaded on phones under revenue-sharing arrangements, they said.

Google owns or controls search distribution channels accounting for about 80% of search queries in the U.S., the officials said. That means Google’s competitors can’t get a meaningful number of search queries and build a scale needed to compete, leaving consumers with less choice and less innovation, and advertisers with less competitive prices, the lawsuit will allege.

Google didn’t immediately respond to a request for comment, but the company has said its competitive edge comes from offering a product that billions of people choose to use each day.

The Mountain View, Calif., company, sitting on a $120 billion cash hoard, is unlikely to shrink from a legal fight. The company has argued that it faces vigorous competition across its different operations and that its products and platforms help businesses small and large reach new customers.

Google’s defense against critics of all stripes has long been rooted in the fact that its services are largely offered to consumers at little or no cost, undercutting the traditional antitrust argument around potential price harms to those who use a product.

The lawsuit follows a Justice Department investigation that has stretched more than a year, and comes amid a broader examination of the handful of technology companies that play an outsize role in the U.S. economy and the daily lives of most Americans.

A loss for Google could mean court-ordered changes to how it operates parts of its business, potentially creating new openings for rival companies. The Justice Department’s lawsuit won’t specify particular remedies; that is usually addressed later in a case. One Justice official said nothing is off the table, including possibly seeking structural changes to Google’s business.

A victory for Google could deal a huge blow to Washington’s overall scrutiny of big tech companies, potentially hobbling other investigations and enshrining Google’s business model after lawmakers and others challenged its market power. Such an outcome, however, might spur Congress to take legislative action against the company.

The case could take years to resolve, and the responsibility for managing the suit will fall to the appointees of whichever candidate wins the Nov. 3 presidential election.

The challenge marks a new chapter in the history of Google, a company formed in 1998 in a garage in a San Francisco suburb—the same year

Microsoft Corp.

was hit with a blockbuster government antitrust case accusing the software giant of unlawful monopolization. That case, which eventually resulted in a settlement, was the last similar government antitrust case against a major U.S. tech firm.

Google’s billionaire co-founders Sergey Brin, left, and Larry Page, shown in 2008, gave up their management roles but remain in effective control of the company.



Photo:

Paul Sakuma/Associated Press

Google started as a simple search engine with a large and amorphous mission “to organize the world’s information.” But over the past decade or so it has developed into a conglomerate that does far more than that. Its flagship search engine handles more than 90% of global search requests, some billions a day, providing fodder for what has become a vast brokerage of digital advertising. Its YouTube unit is the world’s largest video platform, used by nearly three-quarters of U.S. adults.

Google has been bruised but never visibly hurt by various controversies surrounding privacy and allegedly anticompetitive behavior, and its growth has continued almost entirely unchecked. In 2012, the last time Google faced close antitrust scrutiny in the U.S., the search giant was already one of the largest publicly traded companies in the nation. Since then, its market value has roughly tripled to almost $1 trillion.

The company takes on this legal showdown under a new generation of leadership. Co-founders

Larry Page

and

Sergey Brin

, both billionaires, gave up their management roles last year, handing the reins solely to

Sundar Pichai

, a soft-spoken, India-born engineer who earlier in his career helped present Google’s antitrust complaints about Microsoft to regulators.

The chief executive has in his corner Messrs. Page and Brin, who remain on Alphabet’s board and in effective control of the company thanks to shares that give them, along with former Chief Executive

Eric Schmidt

, disproportionate voting power.

More on Google’s Business

Executives inside Google are quick to portray their divisions as mere startups in areas—like hardware, social networking, cloud computing and health—where other Silicon Valley giants are further ahead. Still, that Google has such breadth at all points to its omnipresence.

European Union regulators have targeted the company with three antitrust complaints and fined it about $9 billion, though the cases haven’t left a big imprint on Google’s businesses there, and critics say the remedies imposed on it have proved underwhelming.

In the U.S., nearly all state attorneys general are separately investigating Google, while three other tech giants—

Facebook Inc.,

Apple and

Amazon.com Inc.

—likewise face close antitrust scrutiny. And in Washington, a bipartisan belief is emerging that the government should do more to police the behavior of top digital platforms that control widely used tools of communication and commerce.

More than 10 state attorneys general are expected to join the Justice Department’s case, officials said. Other states are still considering their own cases related to Google’s search practices, and a large group of states is considering a case challenging Google’s power in the digital advertising market, The Wall Street Journal has reported. In the ad-technology market, Google owns industry-leading tools at every link in the complex chain between online publishers and advertisers.

The Justice Department also continues to investigate Google’s ad-tech practices.

Democrats on a House antitrust subcommittee released a report this month following a 16-month inquiry, saying all four tech giants wield monopoly power and recommending congressional action. The companies’ chief executives testified before the panel in July.

Google CEO Sundar Pichai testified before Congress in July, in hearings where lawmakers pressed tech companies’ leaders on their business practices.



Photo:

Graeme Jennings/Press Pool

Big Tech Under Fire

The Justice Department isn’t alone in scrutinizing tech giants’ market power. These are the other inquiries now under way:

  • Federal Trade Commission: The agency has been examining Facebook’s acquisition strategy, including whether it bought platforms like WhatsApp and Instagram to stifle competition. People following the case believe the FTC is likely to file suit by the end of the year.
  • State attorneys general: A group of state AGs led by Texas is investigating Google’s online advertising business and expected to file a separate antitrust case. Another group of AGs is reviewing Google’s search business. Still another, led by New York, is probing Facebook over antitrust concerns.
  • Congress: After a lengthy investigation, House Democrats found that Amazon holds monopoly powers over its third-party sellers and that Apple exerts monopoly power through its App Store. Those findings and others targeting Facebook and Google could trigger legislation. Senate Republicans are separately moving to limit Section 230 of the Communications Decency Act, which gives online platforms a liability shield, saying the companies censor conservative views.
  • Federal Communications Commission: The agency is reviewing a Trump administration request to reinterpret key parts of Section 230, for the same reasons cited by GOP senators. Tech companies are expected to challenge possible action on free-speech grounds.

“It’s Google’s business model that is the problem,”

Rep. David Cicilline

(D., R.I.), the subcommittee chairman, told Mr. Pichai. “Google evolved from a turnstile to the rest of the web to a walled garden that increasingly keeps users within its sights.”

“We see vigorous competition,” Mr. Pichai responded, pointing to travel search sites and product searches on Amazon’s online marketplace. “We are working hard, focused on the users, to innovate.”

Amid the criticism, Google and other tech giants remain broadly popular and have only gained in might and stature since the start of the coronavirus pandemic, buoying the U.S. economy—and stock market—during a period of deep uncertainty.

At the same time, Google’s growth across a range of business lines over the years has expanded its pool of critics, with companies that compete with the search giant, as well as some Google customers, complaining about its tactics.

Specialized search providers like

Yelp Inc.

and

Tripadvisor Inc.

have long voiced such concerns to U.S. antitrust authorities, and newer upstarts like search-engine provider DuckDuckGo have spent time talking to the Justice Department.

News Corp,

owner of The Wall Street Journal, has complained to antitrust authorities at home and abroad about both Google’s search practices and its dominance in digital advertising.

Some Big Tech detractors have called to break up Google and other dominant companies. Courts have indicated such broad action should be a last resort available only if the government clears high legal hurdles, including by showing that lesser remedies are inadequate.

The outcome could have a considerable impact on the direction of U.S. antitrust law. The Sherman Act that prohibits restraints of trade and attempted monopolization is broadly worded, leaving courts wide latitude to interpret its parameters. Because litigated antitrust cases are rare, any one ruling could affect governing precedent for future cases.

Google’s growth across a range of business lines has expanded its pool of critics. The company exhibited at the CES 2020 electronics show in Las Vegas on Jan. 8.



Photo:

Mario Tama/Getty Images

The tech sector has been a particular challenge for antitrust enforcers and the courts because the industry evolves rapidly and many products and services are offered free to consumers, who in a sense pay with the valuable personal data companies such as Google collect.

The search company famously outmaneuvered the Federal Trade Commission nearly a decade ago.

The FTC, which shares antitrust authority with the Justice Department, spent more than a year investigating Google but decided in early 2013 not to bring a case in response to complaints that the company engaged in “search bias” by favoring its own services and demoting rivals. Competition staff at the agency deemed the matter a close call, but said a case challenging Google’s search practices could be tough to win because of what they described as mixed motives within the company: a desire to both hobble rivals and advance quality products and services for consumers.

The Justice Department’s case won’t focus on a search-bias theory, Justice officials said.

Google made a handful of voluntary commitments to address other FTC concerns, a resolution that was widely panned by advocates of stronger antitrust enforcement and continues to be cited as a top failure. Google’s supporters say the FTC’s light touch was appropriate and didn’t burden the company as it continued to grow.

The Department of Justice is investigating the U.S.’s largest tech firms for allegedly monopolistic behavior. Roughly 20 years ago, a similar case threatened to destabilize Microsoft. WSJ explains. (Originally published Sept. 5, 2019)

The Justice Department’s current antitrust chief, Makan Delrahim, spent months negotiating with the FTC last year for jurisdiction to investigate Google this time around. He later recused himself in the case—Google was briefly a client years before while he was in private practice—as the department’s top brass moved to take charge.

The Justice Department lawsuit comes after internal tensions, with some staffers skeptical of Attorney General

William Barr

’s push to bring a case as quickly as possible, the Journal has reported. The reluctant staffers worried the department hadn’t yet built an airtight case and feared rushing to litigation could lead to a loss in court. They also worried Mr. Barr was driven by an interest in filing a case before the election. Others were more comfortable moving ahead.

Mr. Barr has pushed the department to move forward under the belief that antitrust enforcers have been too slow and hesitant to take action, according to a person familiar with his thinking. He has taken an unusually hands-on role in several areas of the department’s work and repeatedly voiced interest in investigating tech-company dominance.

Attorney General William Barr has pushed to bring an antitrust case quickly against Google, in some cases taking an unusually hands-on role in preparations.



Photo:

matt mcclain/press pool

If the Microsoft case from 20 years ago is any guide, Mr. Barr’s concern with speed could run up against the often slow pace of litigation.

After a circuitous route through the court system, including one initial trial-court ruling that ordered a breakup, Microsoft reached a 2002 settlement with the government and changed some aspects of its commercial behavior but stayed intact. It remained under court supervision and subject to terms of its consent decree with the government until 2011.

Antitrust experts have long debated whether the settlement was tough enough on Microsoft, though most observers believe the agreement opened up space for a new generation of competitors.

Write to Brent Kendall at brent.kendall@wsj.com and Rob Copeland at rob.copeland@wsj.com

Copyright ©2020 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Continue Reading

Latest

You Reap What You Code

Mish Boyka

Published

on

 

2020/10/20

You Reap What You Code

 

This is a loose transcript of my talk at Deserted Island DevOps Summer Send-Off, an online conference in COVID-19 times. One really special thing about it is that the whole conference takes place over the Animal Crossing video game, with quite an interesting setup.

It was the last such session of the season, and I was invited to present with few demands. I decided to make a compressed version of a talk I had been mulling over for close to a year, and had lined up for at least one in-person conference that got cancelled/reported in April and had given in its fill hour-long length internally at work. The final result is a condensed 30 minutes that touches all kinds of topics, some of which have been borrowed from previous talks and blog posts of mine.

If I really wanted to, I could probably make one shorter blog post out of every one or two slides in there, but I decided to go for coverage rather than depth. Here goes nothing.

'You Reap What You Code': shows my character in-game sitting at a computer with a bunch of broken parts around, dug from holes in the ground

So today I wanted to give a talk on this tendency we have as software developers and engineers to write code and deploy things that end up being a huge pain to live with, to an extent we hadn’t planned for.

In software, a pleasant surprise is writing for an hour without compiling once and then it works; a nasty surprise is software that seems to work and after 6 months you find out it poisoned your life.

This presentation is going to be a high level thing, and I want to warn you that I’m going to go through some philosophical concerns at first, follow that up with research that has taken place in human factors and cognitive science, and tie that up with broad advice that I think could be useful to everyone when it comes to system thinking and designing things. A lot of this may feel a bit out there, but I hope that by the end it’ll feel useful to you

'Power and Equity; Ivan Illich' shows a screenshot of the game with a little village-style view

This is the really philosophical stuff we’re starting with. Ivan Illich was a wild ass philosopher who hated things like modern medicine and mandatory education. He wrote this essay called “Power and Equity” (to which I was introduced by reading a Stephen Krell presentation), where he decides to also dislike all sorts of motorized transportation.

Ivan Illiches introduces the concept of an “oppressive” monopoly; if we look at societies that developed for foot traffic and cycling, you can generally use any means of transportation whatsoever and effectively manage to live and thrive there. Whether you live in a tent or a mansion, you can get around the same.

He pointed out that cycling was innately fair because it does not require more energy than what is required as a baseline to operate: if you can walk, you can cycle, and cycling, for the same energy as walking, is incredibly more efficient. Cars don’t have that; they are rather expensive, and require disproportionate amounts of energy compared to what a basic person has.

His suggestion was that all non-freight transport, whether cars or busses and trains, be capped to a fixed percentage above the average speed of a cyclist, which is based on the power a normal human body can produce on its own. He suggested we do this to prevent…

Aerial stock photo of an American suburb

that!

We easily conceived cars as ways to make existing burdens easier: it created freedoms, widened our access to goods and people. It was a better horse, and a less exhausting bicycle. And so society would develop to embrace cars in its infrastructure.

Rather than having a merchant bring goods to the town square, the milkman drop milk on the porch, and markets smaller and distributed closer to where they’d be convenient, it is now everyone’s job to drive for each of these things while stores go to where land is cheap rather than where people are. And when society develops with a car in mind, you now need a car to be functional.

In short the cost of participating in society has gone up, and that’s what an oppressive monopoly is.

'The Software Society': Van Bentum's painting The Explosion in the Alchemist's Laboratory

To me, the key thing that Illich did was twist the question another way: what effects would cars have on society if a majority of people had them, and what effect would it have on the rest of us?

The question I now want to ask is whether we have the equivalent in the software world. What are the things we do that we perceive increase our ability to do things, but turn out to actually end up costing us a lot more to just participate?

We kind of see it with our ability to use all the bandwidth a user may have; trying to use old dial-up connections is flat out unworkable these days. But do we have the same with our cognitive cost? The tooling, the documentation, the procedures?

'Ecosystems; we share a feedback loop': a picture of an in-game aquarium within the game's museum

I don’t have a clear answer to any of this, but it’s a question I ask myself a lot when designing tools and software.

The key point is that the software and practices that we choose to use is not just something we do in a vacuum, but part of an ecosystem; whatever we add to it changes and shifts expectations in ways that are out of our control, and impacts us back again. The software isn’t trapped with us, we’re trapped with the software.

Are we not ultimately just making our life worse for it? I want to focus on this part where we make our own life, as developers, worse. When we write or adopt software to help ourselves but end up harming ourselves in the process, because that speaks to our own sustainability.

'Ironies of automation; (Bainbridge, 1983): A still from Fantasia's broom scene

Now we’re entering the cognitive science and human factors bit.

Rather than just being philosophical here I want to ground things in the real world with practical effects. Because this is something that researchers have covered. The Ironies of automation are part of cognitive research (Bainbridge, 1983) that looked into people automating tasks and finding out that the effects weren’t as good as expected.

Mainly, it’s attention and practice clashing. There are tons of examples over the years, but let’s take a look at a modern one with self-driving cars.

Self-driving cars are a fantastic case of clumsy automation. What most established players in the car industry are doing is lane tracking, blind spot detection, and handling parallel parking.

But high tech companies (Tesla, Waymo, Uber) are working towards full self-driving, with Tesla’s autopilot being the most ambitious one being released to the public at large. But all of these right now operate in ways Bainbridge fully predicted in 1983:

  • the driver is no longer actively involved and is shifted to the role of monitoring
  • the driver, despite no longer driving the car, regardless must be fully aware of everything the car is doing
  • when the car gets in a weird situation, it is expected that the driver takes control again
  • so the car handles all the easy cases, but all the hard cases are left to the driver

Part of the risk there is twofold: people have limited attention for tasks they are not involved in—if you’re not actively driving it’s going to be hard to be attentive for extended periods of time—and if you’re only driving rarely with only the worst cases, you risk being out of practice to handle the worst cases.

Such automation is done in airlines who otherwise make up for it in simulator hours, and still manually handling planned difficult areas like takeoff and landing. Still, a bunch of airline incidents discover that this hand-off is often complex and not going well.

Clearly, when we ignore the human component and its responsibilities in things, we might make software worse than what it would have been.

'HABA-MABA problems': a chart illustrating Fitt's model using in-game images

In general most of these errors come from the following point of view. This is called the “Fitts” model, also “HABA-MABA”, for “Humans are better at, machines are better at” (the original version was referred as MABA-MABA, using “Men” rather than “Humans”). This model frames humans as slow, perceptive beings able of judgement, and machines are fast undiscerning indefatigable things.

We hear this a whole lot even today. These things are, to be polite, a beginner’s approach to automation design. It’s based on scientifically outdated concepts, intuitive-but-wrong sentiments, and is comforting in letting you think that only the predicted results will happen and totally ignores any emergent behaviour. It operates on what we think we see now, not on stronger underlying principles, and often has strong limitations when it comes to being applied in practice.

It is disconnected from the reality of human-machine interactions, and frames choices as binary when they aren’t, usually with the intent of pushing the human out of the equation when you shouldn’t. This is, in short, a significant factor behind the ironies of automation.

'Joint Cognitive Systems': a chart illustrating the re-framing of computers as teammates

Here’s a patched version established by cognitive experts. They instead reframe the human-computer relationship as a “joint cognitive system”, meaning that instead of thinking of humans and machines as unrelated things that must be used in distinct contexts for specific tasks, we should frame humans and computers as teammates working together. This, in a nutshell, shifts the discourse from how one is limited to terms of how one can complement the other.

Teammates do things like being predictable to each other, sharing a context and language, being able to notice when their actions may impact others and adjust accordingly, communicate to establish common ground, and have an idea of everyone’s personal and shared objectives to be able to help or prioritize properly.

Of course we must acknowledge that we’re nowhere close to computers being teammates as the state of the art today. And since currently computers need us to keep realigning them all the time, we have to admit that the system is not just the code and the computers, it’s the code, the computers, and all the people who interact with them and each other. And if we want our software to help us, we need to be able to help it, and to help it that means the software needs to be built knowing it will be full of limitations and having us work to make it easier to diagnose issues and form and improve mental models.

So the question is: what makes a good model? How can we help people work with what we create?

'How People From Models': a detailed road map of the city of London, UK

note: this slide and the next one are taken from my talk on operable software

This is a map of the city of London, UK. It is not the city of London, just a representation of it. It’s very accurate: it has streets with their names, traffic directions, building names, rivers, train stations, metro stations, footbridges, piers, parks, gives details regarding scale, distance, and so on. But it is not the city of London itself: it does not show traffic nor roadwork, it does not show people living there, and it won’t tell you where the good restaurants are. It is a limited model, and probably an outdated one.

But even if it’s really limited, it is very detailed. Detailed enough that pretty much anyone out there can’t fit it all in their head. Most people will have some detailed knowledge of some parts of it, like the zoomed-in square in the image, but pretty much nobody will just know the whole of it in all dimensions.

In short, pretty much everyone in your system only works from partial, incomplete, and often inaccurate and outdated data, which itself is only an abstract representation of what goes on in the system. In fact, what we work with might be more similar to this:

A cartoony tourist map of London's main attractions

That’s more like it. This is still not the city of London, but this tourist map of London is closer to what we work with. Take a look at your architecture diagrams (if you have them), and chances are they look more like this map than the very detailed map of London. This map has most stuff a tourist would want to look at: important buildings, main arteries to get there, and some path that suggests how to navigate them. The map has no well-defined scale, and I’m pretty sure that the two giant people on Borough road won’t fit inside Big Ben. There are also lots of undefined areas, but you will probably supplement them with other sources.

But that’s alright, because mental models are as good as their predictive power; if they let you make a decision or accomplish a task correctly, they’re useful. And our minds are kind of clever in that they only build models as complex as they need to be. If I’m a tourist looking for my way between main attractions, this map is probably far more useful than the other one.

There’s a fun saying about this: “Something does not exist until it is broken.” Subjectively, you can be entirely content operating a system for a long time without ever knowing about entire aspects of it. It’s when they start breaking or that your predictions about the system no longer works that you have to go back and re-tune your mental models. And since this is all very subjective, everyone has different models.

This is a vague answer to what is a good model, and the follow up is how can we create and maintain them?

'Syncing Models': a still from the video game in the feature where you back up your island by uploading it online

One simple step, outside of all technical components, is to challenge and help each other to sync and build better mental models. We can’t easily transfer our own models to each other, and in fact it’s pretty much impossible to control them. What we can do is challenge them to make sure they haven’t eroded too much, and try things to make sure they’re still accurate, because things change with time.

So in a corporation, things we might do include training, documentation, incident investigations all help surface aspects and changes to our systems to everyone. Game days and chaos engineering are also excellent ways to discover how our models might be broken in a controlled setting.

They’re definitely things we should do and care about, particularly at an organisational level. That being said, I want to focus a bit more on the technical stuff we can do as individuals.

'Layering Observability': a drawing of abstraction layers and observation probes' locations

note: this slide is explored more in depth in my talk on operable software

We can’t just open a so-called glass pane and see everything at once. That’s too much noise, too much information, too little structure. Seeing everything is only useful to the person who knows what to filter in and filter out. You can’t easily form a mental model of everything at once. To aid model formation, we should structure observability to tell a story.

Most applications and components you use that are easy to operate do not expose their internals to you, they mainly aim to provide visibility into your interactions with them. There has to be a connection between the things that the users are doing and the impact it has in or on the system, and you will want to establish that. This means:

  • Provide visibility into interactions between components, not their internals
  • log at the layer below which you want to debug, which saves time and how many observability probes you need to insert in your code base. We have a tendency to stick everything at the app level, but that’s misguided.
  • This means the logs around a given endpoint have to be about the user interactions with that endpoint, and require no knowledge of its implementation details
  • For developer logs, you can have one log statement shared by all the controllers by inserting it a layer below endpoints within the framework, rather than having to insert one for each endpoint.
  • These interactions will let people make a mental picture of what should be going on and spot where expectations are broken more easily. By layering views, you then make it possible to skip between layers according to which expectations are broken and how much knowledge they have
  • Where a layer provides no easy observability, people must cope through inferences in the layers above and below it. It becomes a sort of obstacle.

Often we are stuck with only observability at the highest level (the app) or the lowest level (the operating system), with nearly nothing useful in-between. We have a blackbox sandwich where we can only look at some parts, and that can be a consequence of the tools we choose. You’ll want to actually pick runtimes and languages and frameworks and infra that let you tell that observability story and properly layer it.

'Logging Practices': a game character chopping down trees

Another thing to help with model formation is maintaining that relationship between humans and machines going smoothly. This is a trust relationship, and providing information that is considered misleading or unhelpful erodes that trust. There are a few things you can do with logs that can help not ruin your marriage to the computer.

The main one is to log facts, not interpretations. You often do not have all the context from within a single log line, just a tiny part of it. If you start trying to be helpful and suggesting things to people, you change what is a fact-gathering expedition into a murder-mystery investigations where bits of the system can’t be trusted or you have to rean between the lines. That’s not helpful. A log line that says TLS validation error: SEC_ERROR_UNKNOWN_ISSUER is much better than one that says ERROR: you are being hacked regardless of how much experience you have.

A thing that helps with that is structured logging, which is better than regular text. It makes it easier for people to use scripts or programs to parse, aggregate, route, and transform logs. It prevents you from needing full-text search to figure out what happened. If you really want to provide human readable text or interpretations, add it to a field within structured logging.

Finally, adopting consistent naming mechanisms and units is always going to prove useful.

'Hitting Limits': the game's museum's owl being surprised while woken up

There is another thing called the Law of Requisite Variety, which says that only complexity can control complexity. If an agent can’t represent all the possible states and circumstances around a thing it tries to control, it won’t be able to control it all. Think of an airplane’s flight stabilizers; they’re able to cope only with a limited amount of adjustment, and usually at a higher rate than we humans could. Unfortunately, once it reaches a certain limit in its actions and things it can perceive, it stops working well.

That’s when control is either ineffective, or passed on to the next best things. In the case of software we run and operate, that’s us, we’re the next best thing. And here we fall into the old idea that if you are as clever as you can to write something, you’re in trouble because you need to be doubly as clever to debug it.

That’s because to debug a system that is misbehaving under automation, you need to understand the system, and then understand the automation, then understand what the automation thinks of the system, and then take action.

That’s always kind of problematic, but essentially, brittle automation forces you to know more than if you had no automation in order to make things work in difficult times. Things can then become worse than if you had no automation in the first place.

'Handle Hand-Offs First': this in-game owl/museum curator accepting a bug he despises for his collection

When you start creating a solution, do it while being aware that it is possibly going to be brittle and will require handing control over to a human being. Focus on the path where the automation fails and how the hand-off will take place. How are you going to communicate that, and which clues or actions will an operator have to take over things?

When we accept and assume that automation will reach its limits, and the thing that it does is ask a human for help, we shift our approach to automation. Make that hand-off path work easily. Make it friendly, and make it possible for the human to understand what the state of automation was at a given point in time so you can figure out what it was doing and how to work around it. Make it possible to guide the automation into doing the right thing.

Once you’ve found your way around that, you can then progressively automate things, grow the solution, and stay in line with these requirements. It’s a backstop for bad experiences, similar to “let it crash” for your code, so doing it well is key.
:

'Curb Cut Effect': a sidewalk with the classic curb cut in it

Another thing that I think is interesting is the curb cut effect. The curb cut effect was noticed as a result from the various American laws about accessibilities that started in the 60s. The idea is that to make sidewalks and streets accessible to people in wheelchairs, you would cut the part of the curb so that it would create a ramp from sidewalk to street.

The thing that people noticed is that even though you’d cut the curb for handicapped people, getting around was now easier for people carrying luggage, pushing strollers, on skateboards or bicycles, and so on. Some studies saw that people without handicaps would even deviate from their course to use the curb cuts.

Similar effects are found when you think of something like subtitles which were put in place for people with hearing problems. When you look at the raw number of users today, there are probably more students using them to learn a second or third language than people using them with actual hearing disabilities. Automatic doors that open when you step in front of them are also very useful for people carrying loads of any kind, and are a common example of doing accessibility without “dumbing things down.”

I’m mentioning all of this because I think that keeping accessibility in mind when building things is one of the ways we can turn nasty negative surprises into pleasant emerging behaviour. And generally, accessibility is easier to build in than to retrofit. In the case of the web, accessibility also lines up with better performance.

If you think about diversity in broader terms, how would you rethink your dashboards and monitoring and on-call experience if you were to run it 100% on a smartphone? What would that let people on regular computers do that they cannot today? Ask the same question but with user bases that have drastically different levels of expertise.

I worked with an engineer who used to work in a power station and the thing they had set up was that during the night, when they were running a short shift, they’d generate an audio file that contained all the monitoring metrics. They turned it into a sort of song, and engineers coming in in the morning would listen to it on fast forward to look for anomalies.

Looking at these things can be useful. If you prepare for your users of dashboards to be colorblind, would customizing colors be useful? And could that open up new regular use cases to annotate metrics that tend to look weird and for which you want to keep an eye on?

And so software shouldn’t be about doing more with less. It’s actually requiring less to do more. As in letting other people do more with less.

'Complexity Has To Live Somewhere': in-game's 'The Thinker' sitting at a desk, looking like it's pondering at papers

note: this slide is a short version of my post on Complexity Has to Live Somewhere

A thing we try to do, especially as software engineers, is to try to keep the code and the system—the technical part of the system—as simple as possible. We tend to do that by finding underlying concepts, creating abstractions, and moving things outside of the code. Often that means we rely on some sort of convention.

When that happens, what really goes on is that the complexity of how you chose to solve a problem still lingers around. Someone has to handle the thing. If you don’t, your users have to do it. And if it’s not in the code, it’s in your operators or the people understanding the code. Because if the code is to remain simple, the difficult concepts you abstracted away still need to be understood and present in the world that surrounds the code.

I find it important to keep that in mind. There’s this kind of fixed amount of complexity that moves around the organization, both in code and in the knowledge your people have.

Think of how people interact with the features day to day. What do they do, how does it impact them? What about the network of people around them? How do they react to that? Would you approach software differently if you think that it’s still going to be around in 5, 10, or 20 years when you and everyone who wrote it has left? If so, would that approach help people who join in just a few months?

One of the things I like to think about is that instead of using military analogies of fights and battles, it’s interesting to frame it in terms of gardens or agriculture. When we frame the discussion that we have in terms of an ecosystem and the people working collectively within it, the way we approach solving problems can also change drastically.

'Replacing, Adding, or Diffusing?': the trolley problem re-enacted with in-game items

Finally, one of the things I want to mention briefly is this little thought framework I like when we’re adopting new technology.

One we first adopt a new piece of technology, the thing we try to do—or tend to do—is to start with the easy systems first. Then we say “oh that’s great! That’s going to replace everything we have.” Eventually, we try to migrate everything, but it doesn’t always work.

So an approach that makes sense is to start with the easy stuff to probe that it’s workable for the basic cases. But also try something really, really hard, because that would be the endpoint. The endgame is to migrate the hardest thing that you’ve got.

If you’re not able to replace everything, consider framing things as adding it to your system rather than replacing. It’s something you add to your stack. This framing is going to change the approach you have in terms of teaching, maintenance, and in terms of pretty much everything that you have to care about so you avoid the common trap of deprecating a piece of critical technology with nothing to replace it. If you can replace a piece of technology then do it, but if you can’t, don’t fool yourself. Assume the cost of keeping things going.

The third one there is diffusing. I think diffusing is something we do implicitly when we do DevOps. We took the Ops responsibilities and the Dev responsibilities and instead of having it in different areas and small experts in dev and operation, you end up making it everybody’s responsibility to be aware of all aspects.

That creates that diffusion where in this case, it can be positive. You want everyone to be handling a task. But if you look at the way some organisations are handling containerization, it can be a bunch of operations people who no longer have to care about that aspect of their job. Then all of the development teams now have to know and understand how containers work, how to deploy them, and just adapt their workflow accordingly.

In such a case we haven’t necessarily replaced or removed any of the needs for deployment. We’ve just taken it outside of the bottleneck and diffused it and sent it to everyone else.

I think having an easy way, early in the process, to figure out whether what we’re doing is replacing, adding, or diffusing things will drastically influence how we approach change at an organisational level. I think it can be helpful.

'Thanks': title slide again

This is all I have for today. Hopefully it was practical.

Thanks!

 

Continue Reading

Latest

The Surprising Impact of Medium-Size Texts on PostgreSQL Performance

Mish Boyka

Published

on

 


Any database schema is likely to have plenty of text fields. In this article, I divide text fields into three categories:

  1. Small texts: names, slugs, usernames, emails, etc. These are text fields that usually have some low size limit, maybe even using varchar(n) and not text.
  2. Large texts: blog post content, articles, HTML content etc. These are large pieces of free, unrestricted text that is stored in the database.
  3. Medium texts: descriptions, comments, product reviews, stack traces etc. These are any text field that is between the small and the large. These type of texts would normally be unrestricted, but naturally smaller than the large texts.

In this article I demonstrate the surprising impact of medium-size texts on query performance in PostgreSQL.

Sliced bread... it gets better<br><small>Photo by <a href="https://unsplash.com/photos/WHJTaLqonkU">Louise Lyshøj</a></small>
Sliced bread… it gets better
Photo by Louise Lyshøj
Table of Contents

When talking about large chunks of text, or any other field that may contain large amounts of data, we first need to understand how the database handles the data. Intuitively, you might think that the database is storing large pieces of data inline like it does smaller pieces of data, but in fact, it does not:

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly.

As the documentation explains, PostgreSQL can’t store rows (tuples) in multiple pages. So how does the database store large chunks of data?

[…] large field values are compressed and/or broken up into multiple physical rows. […] The technique is affectionately known as TOAST (or “the best thing since sliced bread”).

OK, so how is this TOAST working exactly?

If any of the columns of a table are TOAST-able, the table will have an associated TOAST table

So TOAST is a separate table associated with our table. It is used to store large pieces of data of TOAST-able columns (the text datatype for example, is TOAST-able).

What constitutes a large value?

The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had

PostgreSQL will try to compress a the large values in the row, and if the row can’t fit within the limit, the values will be stored out-of-line in the TOAST table.

Finding the TOAST

Now that we have some understanding of what TOAST is, let’s see it in action. First, create a table with a text field:

db=# CREATE TABLE toast_test (id SERIAL, value TEXT);
CREATE TABLE

The table contains an id column, and a value field of type TEXT. Notice that we did not change any of the default storage parameters.

The text field we added supports TOAST, or is TOAST-able, so PostgreSQL should create a TOAST table. Let’s try to locate the TOAST table associated with the table toast_test in pg_class:

db=# SELECT relname, reltoastrelid FROM pg_class WHERE relname = 'toast_test';
  relname   │ reltoastrelid
────────────┼───────────────
 toast_test │        340488

db=# SELECT relname FROM pg_class WHERE oid = 340488;
     relname
─────────────────
 pg_toast_340484

As promised, PostgreSQL created a TOAST table called pg_toast_340484.

TOAST in Action

Let’s see what the TOAST table looks like:

db=# d pg_toast.pg_toast_340484
TOAST table "pg_toast.pg_toast_340484"
   Column   │  Type
────────────┼─────────
 chunk_id   │ oid
 chunk_seq  │ integer
 chunk_data │ bytea

The TOAST table contains three columns:

  • chunk_id: A reference to a toasted value.
  • chunk_seq: A sequence within the chunk.
  • chunk_data: The actual chunk data.

Similar to “regular” tables, the TOAST table also has the same restrictions on inline values. To overcome this restriction, large values are split into chunks that can fit within the limit.

At this point the table is empty:

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

This makes sense because we did not insert any data yet. So next, insert a small value into the table:

db=# INSERT INTO toast_test (value) VALUES ('small value');
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

After inserting the small value into the table, the TOAST table remained empty. This means the small value was small enough to be stored inline, and there was no need to move it out-of-line to the TOAST table.

1″small value”idvalue
Small text stored inline

Let’s insert a large value and see what happens:

db=# INSERT INTO toast_test (value) VALUES ('n0cfPGZOCwzbHSMRaX8 ... WVIlRkylYishNyXf');
INSERT 0 1

I shortened the value for brevity, but that’s a random string with 4096 characters. Let’s see what the TOAST table stores now:

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼──────────────────────
   995899 │         0 │ x30636650475a4f43...
   995899 │         1 │ x50714c3756303567...
   995899 │         2 │ x6c78426358574534...
(3 rows)

The large value is stored out-of-line in the TOAST table. Because the value was too large to fit inline in a single row, PostgreSQL split it into three chunks. The x3063... notation is how psql displays binary data.

1″small value”2213x…..x…..x…..idvalue
Large text stored out-of-line, in the associated TOAST table

Finally, execute the following query to summarize the data in the TOAST table:

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;
 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
   995899 │      3 │ 4096 bytes
(1 row)

As we’ve already seen, the text is stored in three chunks.

size of database objects

There are several ways to get the size of database objects in PostgreSQL:

  • pg_table_size: Get the size of the table including TOAST, but excluding indexes
  • pg_relation_size: Get the size of just the table
  • pg_total_relation_size: Get the size of the table, including indexes and TOAST

Another useful function is pg_size_pretty: used to display sizes in a friendly format.

TOAST Compression

So far I refrained from categorizing texts by their size. The reason for that is that the size of the text itself does not matter, what matters is its size after compression.

To create long strings for testing, we’ll implement a function to generate random strings at a given length:

CREATE OR REPLACE FUNCTION generate_random_string(
  length INTEGER,
  characters TEXT default '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
) RETURNS TEXT AS
$$
DECLARE
  result TEXT := '';
BEGIN
  IF length < 1 then
      RAISE EXCEPTION 'Invalid length';
  END IF;
  FOR __ IN 1..length LOOP
    result := result || substr(characters, floor(random() * length(characters))::int + 1, 1);
  end loop;
  RETURN result;
END;
$$ LANGUAGE plpgsql;

Generate a string made out of 10 random characters:

db=# SELECT generate_random_string(10);
 generate_random_string
────────────────────────
 o0QsrMYRvp

We can also provide a set of characters to generate the random string from. For example, generate a string made of 10 random digits:

db=# SELECT generate_random_string(10, '1234567890');
 generate_random_string
────────────────────────
 4519991669

PostgreSQL TOAST uses the LZ family of compression techniques. Compression algorithms usually work by identifying and eliminating repetition in the value. A long string containing fewer characters should compress very well compared to a string made of many different characters when encoded into bytes.

To illustrate how TOAST uses compression, we’ll clean out the toast_test table, and insert a random string made of many possible characters:

db=# TRUNCATE toast_test;
TRUNCATE TABLE

db=# INSERT INTO toast_test (value) VALUES (generate_random_string(1024 * 10));
INSERT 0 1

We inserted a 10kb value made of random characters. Let’s check the TOAST table:

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB

The value is stored out-of-line in the TOAST table, and we can see it is not compressed.

Next, insert a value with a similar length, but made out of fewer possible characters:

db=# INSERT INTO toast_test (value) VALUES (generate_random_string(1024 * 10, '123'));
INSERT 0 1

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB
  1495961 │      2 │ 3067 bytes

We inserted a 10K value, but this time it only contained 3 possible digits: 1, 2 and 3. This text is more likely to contain repeating binary patterns, and should compress better than the previous value. Looking at the TOAST, we can see PostgreSQL compressed the value to ~3kB, which is a third of the size of the uncompressed value. Not a bad compression rate!

Finally, insert a 10K long string made of a single digit:

db=# insert into toast_test (value) values (generate_random_string(1024 * 10, '0'));
INSERT 0 1

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB
  1495961 │      2 │ 3067 bytes

The string was compressed so well, that the database was able to store it in-line.

Configuring TOAST

If you are interested in configuring TOAST for a table you can do that by setting storage parameters at CREATE TABLE or ALTER TABLE ... SET STORAGE. The relevant parameters are:

  • toast_tuple_target: The minimum tuple length after which PostgreSQL tries to move long values to TOAST.
  • storage: The TOAST strategy. PostgreSQL supports 4 different TOAST strategies. The default is EXTENDED, which means PostgreSQL will try to compress the value and store it out-of-line.

I personally never had to change the default TOAST storage parameters.


To understand the effect of different text sizes and out-of-line storage on performance, we’ll create three tables, one for each type of text:

db=# CREATE TABLE toast_test_small (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_medium (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_large (id SERIAL, value TEXT);
CREATE TABLE

Like in the previous section, for each table PostgreSQL created a TOAST table:

SELECT
    c1.relname,
    c2.relname AS toast_relname
FROM
    pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE
    c1.relname LIKE 'toast_test%'
    AND c1.relkind = 'r';

      relname      │  toast_relname
───────────────────┼─────────────────
 toast_test_small  │ pg_toast_471571
 toast_test_medium │ pg_toast_471580
 toast_test_large  │ pg_toast_471589

Set Up Test Data

First, let’s populate toast_test_small with 500K rows containing a small text that can be stored inline:

db=# INSERT INTO toast_test_small (value)
SELECT 'small value' FROM generate_series(1, 500000);
INSERT 0 500000

Next, populate the toast_test_medium with 500K rows containing texts that are at the border of being stored out-of-line, but still small enough to be stored inline:

db=# WITH str AS (SELECT generate_random_string(1800) AS value)
INSERT INTO toast_test_medium (value)
SELECT value
FROM generate_series(1, 500000), str;
INSERT 0 500000

I experimented with different values until I got a value just large enough to be stored out-of-line. The trick is to find a string which is roughly 2K that compresses very poorly.

Next, insert 500K rows with large texts to toast_test_large:

db=# WITH str AS (SELECT generate_random_string(4096) AS value)
INSERT INTO toast_test_large (value)
SELECT value
FROM generate_series(1, 500000), str;
INSERT 0 500000

We are now ready for the next step.

Comparing Performance

We usually expect queries on large tables to be slower than queries on smaller tables. In this case, it’s not unreasonable to expect the query on the small tables to run faster than on the medium table, and a query on the medium table to be faster than the same query on the large table.

To compare performance, we are going to execute a simple query to fetch one row from the table. Since we don’t have an index, the database is going to perform a full table scan. We’ll also disable parallel query execution to get a clean, simple timing, and execute the query multiple times to account for caching.

db=# SET max_parallel_workers_per_gather = 0;
SET

Starting with the small table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
                                    QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────
 Gather  (cost=1000.00..7379.57 rows=1 width=16)
   ->  Parallel Seq Scan on toast_test_small  (cost=0.00..6379.47 rows=1 width=16)
        Filter: (id = 6000)
        Rows Removed by Filter: 250000
 Execution Time: 31.323 ms
(8 rows)

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
Execution Time: 25.865 ms

I ran the query multiple times and trimmed the output for brevity. As expected the database performed a full table scan, and the timing finally settled on ~25ms.

Next, execute the same query on the medium table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
Execution Time: 321.965 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
Execution Time: 173.058 ms

Running the exact same query on the medium table took significantly more time, 173ms, which is roughly 6x slower than on the smaller table. This makes sense.

To complete the test, run the query again on the large table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
Execution Time: 49.867 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
Execution Time: 37.291 ms

Well, this is surprising! The timing of the query on the large table is similar to the timing of the small table, and 6 times faster than the medium table.

Table Timing
toast_test_small 31.323 ms
toast_test_medium 173.058 ms
toast_test_large 37.291 ms

Large tables are supposed to be slower, so what is going on?

Making Sense of the Results

To make sense of the results, have a look at the size of each table, and the size of its associated TOAST table:

SELECT
    c1.relname,
    pg_size_pretty(pg_relation_size(c1.relname::regclass)) AS size,
    c2.relname AS toast_relname,
    pg_size_pretty(pg_relation_size(('pg_toast.' || c2.relname)::regclass)) AS toast_size
FROM
    pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE
    c1.relname LIKE 'toast_test_%'
    AND c1.relkind = 'r';
relname size toast_relname toast_size
toast_test_small 21 MB pg_toast_471571 0 bytes
toast_test_medium 977 MB pg_toast_471580 0 bytes
toast_test_large 25 MB pg_toast_471589 1953 MB

Let’s break it down:

  • toast_test_small: The size of the table is 21MB, and there is no TOAST. This makes sense because the texts we inserted to that table were small enough to be stored inline.
1…..2idvalue500K……….
Small texts stored inline
  • toast_test_medium: The table is significantly larger, 977MB. We inserted text values that were just small enough to be stored inline. As a result, the table got very big, and the TOAST was not used at all.
1………………………………………………………..2idvalue500K………………………………………………………………………………………………………………….
Medium texts stored inline
  • toast_test_large: The size of the table is roughly similar to the size of the small table. This is because we inserted large texts into the table, and PostgreSQL stored them out-of-line in the TOAST table. This is why the TOAST table is so big for the large table, but the table itself remained small.
12idvalue500K1x…..1x…..2x…..2x…..500K500Kx…..x…..
Large texts stored out-of-line in TOAST

When we executed our query, the database did a full table scan. To scan the small and large tables, the database only had to read 21MB and 25MB and the query was pretty fast. However, when we executed the query against the medium table, where all the texts are stored inline, the database had to read 977MB from disk, and the query took a lot longer.

TAKE AWAY

TOAST is a great way of keeping tables compact by storing large values out-of-line!

Using the Text Values

In the previous comparison we executed a query that only used the ID, not the text value. What will happen when we actually need to access the text value itself?

db=# timing
Timing is on.

db=# SELECT * FROM toast_test_large WHERE value LIKE 'foo%';
Time: 7509.900 ms (00:07.510)

db=# SELECT * FROM toast_test_large WHERE value LIKE 'foo%';
Time: 7290.925 ms (00:07.291)

db=# SELECT * FROM toast_test_medium WHERE value LIKE 'foo%';
Time: 5869.631 ms (00:05.870)

db=# SELECT * FROM toast_test_medium WHERE value LIKE 'foo%';
Time: 259.970 ms

db=# SELECT * FROM toast_test_small WHERE value LIKE 'foo%';
Time: 78.897 ms

db=# SELECT * FROM toast_test_small WHERE value LIKE 'foo%';
Time: 50.035 ms

We executed a query against all three tables to search for a string within the text value. The query is not expected to return any results, and is forced to scan the entire table. This time, the results are more consistent with what we would expect:

Table Cold cache Warm cache
toast_test_small 78.897 ms 50.035 ms
toast_test_medium 5869.631 ms 259.970 ms
toast_test_large 7509.900 ms 7290.925 ms

The larger the table, the longer it took the query to complete. This makes sense because to satisfy the query, the database was forced to read the texts as well. In the case of the large table, this means accessing the TOAST table as well.

What About Indexes?

Indexes help the database minimize the number of pages it needs to fetch to satisfy a query. For example, let’s take the first example when we searched for a single row by ID, but this time we’ll have an index on the field:

db=# CREATE INDEX toast_test_medium_id_ix ON toast_test_small(id);
CREATE INDEX

db=# CREATE INDEX toast_test_medium_id_ix ON toast_test_medium(id);
CREATE INDEX

db=# CREATE INDEX toast_test_large_id_ix ON toast_test_large(id);
CREATE INDEX

Executing the exact same query as before with indexes on the tables:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.42..8.44 rows=1 width=16)
  Index Cond: (id = 6000)
Time: 0.772 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_medium_id_ix on toast_test_medium(cost=0.42..8.44 rows=1 width=1808
  Index Cond: (id = 6000)
Time: 0.831 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_large_id_ix on toast_test_large(cost=0.42..8.44 rows=1 width=22)
  Index Cond: (id = 6000)
Time: 0.618 ms

In all three cases the index was used, and we see that the performance in all three cases is almost identical.

By now, we know that the trouble begins when the database has to do a lot of IO. So next, let’s craft a query that the database will choose to use the index for, but will still have to read a lot of data:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id BETWEEN 0 AND 250000;
                                QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.4..9086 rows=249513 width=16
  Index Cond: ((id >= 0) AND (id <= 250000))
Time: 60.766 ms
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id BETWEEN 0 AND 250000;
Time: 59.705 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id BETWEEN 0 AND 250000;
Time: 3198.539 ms (00:03.199)
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id BETWEEN 0 AND 250000;
Time: 284.339 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id BETWEEN 0 AND 250000;
Time: 85.747 ms
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id BETWEEN 0 AND 250000;
Time: 70.364 ms

We executed a query that fetch half the data in the table. This was a low enough portion of table to make PostgreSQL decide to use the index, but still high enough to require lots of IO.

We ran each query twice on each table. In all cases the database used the index to access the table. Keep in mind that the index only helps reduce the number of pages the database has to access, but in this case, the database still had to read half the table.

Table Cold cache Warm cache
toast_test_small 60.766 ms 59.705 ms
toast_test_medium 3198.539 ms 284.339 ms
toast_test_large 85.747 ms 70.364 ms

The results here are similar to the first test we ran. When the database had to read a large portion of the table, the medium table, where the texts are stored inline, was the slowest.

If after reading so far, you are convinced that medium-size texts are what’s causing you performance issues, there are things you can do.

Adjusting toast_tuple_target

toast_tuple_target is a storage parameter that controls the minimum tuple length after which PostgreSQL tries to move long values to TOAST. The default is 2K, but it can be decreased to a minimum of 128 bytes. The lower the target, the more chances are for a medium size string to be move out-of-line to the TOAST table.

To demonstrate, create a table with the default storage params, and another with toast_tuple_target = 128:

db=# CREATE TABLE toast_test_default_threshold (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_128_threshold (id SERIAL, value TEXT) WITH (toast_tuple_target=128);
CREATE TABLE

db=# SELECT c1.relname, c2.relname AS toast_relname
FROM pg_class c1 JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE c1.relname LIKE 'toast%threshold' AND c1.relkind = 'r';

           relname            │  toast_relname
──────────────────────────────┼──────────────────
 toast_test_default_threshold │ pg_toast_3250167
 toast_test_128_threshold     │ pg_toast_3250176

Next, generate a value larger than 2KB that compresses to less than 128 bytes, insert to both tables, and check if it was stored out-of-line or not:

db=# INSERT INTO toast_test_default_threshold (value) VALUES (generate_random_string(2100, '123'));
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_3250167;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

db=# INSERT INTO toast_test_128_threshold (value) VALUES (generate_random_string(2100, '123'));
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_3250176;
─[ RECORD 1 ]─────────────
chunk_id   │ 3250185
chunk_seq  │ 0
chunk_data │ x3408.......

The (roughly) similar medium-size text was stored inline with the default params, and out-of-line with a lower toast_tuple_target.

Create a Separate Table

If you have a critical table that stores medium-size text fields, and you notice that most texts are being stored inline and perhaps slowing down queries, you can move the column with the medium text field into its own table:

CREATE TABLE toast_test_value (fk INT, value TEXT);
CREATE TABLE toast_test (id SERIAL, value_id INT)

In my previous article I demonstrated how we use SQL to find anomalies. In one of those use cases, we actually had a table of errors that contained a python traceback. The error messages were medium texts, many of them stored in-line, and as a result the table got big very quickly! So big in fact, that we noticed queries are getting slower and slower. Eventually we moved the errors into a separate table, and things got much faster!


The main problem with medium-size texts is that they make the rows very wide. This is a problem because PostgreSQL, as well as other OLTP oriented databases, are storing values in rows. When we ask the database to execute a query with only a few columns, the values of these columns are most likely spread across many blocks. If the rows are wide, this translates into a lot of IO, which affect the query performance and resource usage.

To overcome this challenge, some non-OLTP oriented databases are using a different type of storage: columnar storage. Using columnar storage, data is stored on disk by columns, not by rows. This way, when the database has to scan a specific column, the values are stored in consecutive blocks, and it usually translated to less IO. Additionally, values of a specific columns are more likely to have repeating patterns and values, so they are better compressed.

2…..idvalue1…..3…..2…..id1…..3…..value
Row vs Column Storage

For non-OLTP payloads such as data warehouse systems, this makes sense. The tables are usually very wide, and queries often use a small subset of the columns, and read a lot of rows. In OLTP payloads, the system will usually read one or very few rows, so storing data in rows makes more sense.

There has been chatter about pluggable storage in PostgreSQL, so this is something to look out for!

Continue Reading

US Election

US Election Remaining

Advertisement

Trending