Connect with us

Latest

United States and Republic of Korea Advance Economic Cooperation at 5th Senior Economic Dialogue

Mish Boyka

Published

on

“The Republic of Korea is a global economic powerhouse, which shares our same values and democratic principles. The U.S.-ROK economic partnership, which includes safeguarding a clean 5G infrastructure from untrusted, high-risk vendors, will serve as a linchpin for securing a peaceful and prosperous economic future for our two countries and the world.” -Under Secretary Keith Krach, October 13, 2020

“We recognize the importance of continuing to bolster our economic partnership amidst the COVID-19 pandemic and global economic uncertainties. Building on our close cooperation in the fight against this pandemic, we will continue our joint efforts in the months to come to prevent this pandemic from disrupting our economic interactions.” -Vice Foreign Minister Lee Tae-ho, October 13, 2020

The U.S. Department of State hosted the fifth United States (U.S.)-Republic of Korea (ROK) Senior Economic Dialogue (SED) in a virtual format on October 13. Under Secretary of State for Economic Growth, Energy, and the Environment Keith Krach and Ministry of Foreign Affairs 2nd Vice Foreign Minister Lee Tae-ho led the delegations.

The 5th SED provided an opportunity to take stock of progress and explore avenues to further cooperation between the ROK and the United States, particularly in line with the New Southern Policy of the ROK and the Indo-Pacific Strategy of the United States.

Both sides intend to continue their close coordination through the SED to further strengthen bilateral economic ties and enhance their partnership on regional and global issues.

In recognition of the ROK’s ascendance as a leading global economy, Under Secretary Krach highlighted the ways in which the U.S.-ROK economic partnership has expanded its scope of cooperation from bilateral and regional issues to increasing engagement on global issues – including in the areas of economic security, women’s economic empowerment, and protecting sensitive and emerging technologies.

The discussion showcased the strength of the U.S.-ROK economic partnership during a time of unprecedented global economic and public health challenges, and focused on opportunities to expand bilateral cooperation to combat the COVID-19 pandemic, accelerate global economic recovery and supply chain and manufacturing diversification, and advance prosperity across the Indo-Pacific region.

Bilateral Issues: Economic Relationship

  • The United States is the ROK’s only treaty ally and its second largest trading partner. It is also the ROK’s second largest source of foreign investment after Japan. The relationship is about protecting our countries by defending the values and rules that underpin our prosperity, democracy, and way of life.
  • The U.S.-ROK Alliance is built not only on security goals, but also on our commitment to the core principles that have enabled the success of both of our nations – and that includes a strong and enduring economic partnership.
  • The United States is a significant foreign investor in the ROK, a substantial energy supplier, and an enormous export market for Korean goods and services.
  • More than 1,200 American companies operate in the ROK, including flagship enterprises such as Apple, Microsoft, and General Motors.
  • In 2019, the ROK and the United States two-way trade of goods and services totaled more than $170 billion. The ROK is now the United States’ 6th-largest goods trading partner.
  • During the SED, both sides stressed the importance of investment and acknowledged that increasing investment by Korean companies in the United States has contributed to a strong and constructive economic relationship. Major recent investments by Korean firms include:
    • Doosan Bobcat’s $26 million investment in Litchfield, Minnesota
    • Zinus USA’s $108 million investment in McDonough, Georgia
    • SK Innovation’s $830 million investment in Commerce, Georgia
  • The ROK and the United States affirmed the strength of their bilateral economic relationship, a core pillar of U.S.-ROK ties. Builiding on this firm foundation, both sides committed to continue expanding bilateral economic cooperation.

5G and the Clean Network

  • Both sides noted the importance of our companies having the freedom to innovate and grow in the long-term.
  • The United States believes it is critical for the ROK to join the Clean Network for the sake of its own national security. The Clean Network brings together countries with shared values that underpin their prosperity and their freedom.
  • The United States explained that the Clean Network is a comprehensive approach to addressing the long-term threats to data privacy, security, human rights, and trusted collaboration posed by the Chinese Communist Party (CCP). The Clean Network, which is rooted in internationally accepted Digital Trust Standards, represents the execution of a multi-year, enduring strategy built on a coalition of trusted partners.
  • Many Clean Countries and Clean Telcos, including KT and SK Telecom, and leading Clean Companies like Oracle, HP, Reliance Jio, NEC, Fujitsu, Cisco, NTT, SoftBank and VMware, have joined the Clean Network.
  • Momentum for the Clean Network initiative is accelerating. Like-minded partners in government and industry around the world have joined the growing tide to secure our data from the CCP’s surveillance state. By building a coalition, the United States and its partners are working to enhance the protection of our citizens’ data and our freedoms. We are united in standing up to the CCP’s retaliation.
  • The United States emphasized to the ROK the importance of ensuring that 5G networks are completely free from untrusted vendors. Secretary of State Michael R. Pompeo launched the State Department’s 5G Clean Path initiative on April 29, 2020, so that untrusted IT vendors will have no access to digital cellular telecommunications systems and networks that service U.S. diplomatic communications at home and abroad. 5G Clean Path is designed to help us build trust across 5G standalone networks to safeguard national and economic security as well as data privacy.
  • The United States is asking all allies and partners – including the ROK – to join it in using a 5G Clean Path for their own diplomatic facilities, in addition to ensuring their domestic networks are secure.

Focus on Building a Coalition of Trusted Partners

  • The United States also discussed Clean Infrastructure and Clean Financing with its Blue Dot Network partners – and noted that the ROK could benefit from becoming more involved.
  • Along with several countries, the United States has formed the Energy Resource Governance Initiative (ERGI) in the area of critical minerals. The U.S. intends to integrate the ERGI into a similar initiative with the EU and welcomes closer cooperation with the ROK as well. The potential exists to combine the ERGI with a similar initiative with the United States’ Quad partners and to form a Clean Minerals Network.
  • The United States is also focused on encouraging the private sector to adopt secure and diversified Supply Chains that adhere to Clean Labor Practices and are free from the forced labor of Xinjiang.

Science, Technology, and Environmental Cooperation

  • The United States and the ROK have worked together for over 60 years to protect the environment and are leaders in integrating environmental concerns into our economic growth and development practices.
  • Both sides continue to work together to promote natural resource protection, including by addressing marine debris, promoting clean air, and combatting illegal, unreported, and unregulated fishing.
  • The United States and the ROK intend to finalize the extension of the U.S.-ROK Science and Technology Agreement (STA). The STA provides opportunities for the development of joint contacts and initiatives between U.S. and ROK governmental agencies, universities, research centers, and other institutions and firms.

Energy Cooperation

  • The U.S. Department of State hosted the seventh U.S.-ROK Energy Security Dialogue in a virtual format on August 12. The discussion reinforced the role energy cooperation plays in strengthening the U.S.-ROK partnership and focused on energy markets post-COVID-19, national energy policies, bilateral cooperation, and areas of mutual cooperation in multilateral fora devoted to energy. Both sides decided to continue collaboration to enhance energy security in the physical and cyber domains, and to strengthen their energy industries.

COVID-19 Cooperation

  • S. and ROK cooperation in responding to COVID-19 underscores the close partnership between our countries and peoples. In the early months of the pandemic, the United States and the ROK collaborated to establish robust travel screening measures to preserve air linkages between our countries, which also assisted with repatriating each other’s citizens. The ROK donated essential medical supplies to the United States, including 2.5 million protective masks, and facilitated the export of COVID-19 test kits to the United States. U.S. and ROK experts and policymakers continue to share best practices on fighting COVID-19. This kind of close-knit economic and health security cooperation is only possible during a pandemic when there is already a deep and well-established relationship.

Regional Cooperation: New Southern Policy and Indo-Pacific Strategy

  • This last summer, the United States and the ROK held a dialogue focused specifically on the U.S. Indo-Pacific Strategy and the ROK New Southern Policy. Both of these strategies recognize that an open and free Indo-Pacific is vital to the future of the global economy. For the region to flourish economically, it is crucial that there are open investment environments, good governance, and paths for sustainable growth.
  • During the SED, the two countries examined ways to strengthen their economic coordination in the Indo-Pacific Region in the areas of development, energy, and infrastructure cooperation. The United States values its partnership with the ROK on these initiatives and welcomes the ROK’s commitment of $233 million in Official Development Assistance to ASEAN countries in 2020, which complements U.S. development activity in the Indo-Pacific region. The discussions at the SED on the Indo-Pacific Strategy and the New Southern Policy further advanced the substance of our partnership.

Global Cooperation: Cross-Cutting Issues

  • Reaffirming that cooperation on global issues is an increasingly important and expanding pillar of the ROK-U.S. Alliance, both sides held in-depth discussions on various ways to expand cooperation on global issues including women’s economic empowerment, Information, Communication and Technology (ICT) and emerging technologies, and economic security.

Women’s Economic Empowerment

  • The United States and the ROK are participating, alongside Japan, in the Trilateral Summit on Women’s Leadership in Science, Technology, Engineering and Math (STEM), which will be led by U.S. Ambassador-at-Large for Global Women’s Issues Kelley Currie on October 19-22 in a virtual format.
  • The United States and the ROK also plan to work together to engage relevant U.S. and ROK government agencies and the private sector to identify and highlight best practices for supporting women’s empowerment in the workplace as part of their continued cooperation under the U.S.-ROK Action Plan on Women’s Economic Empowerment.

ICT and Emerging Technologies

  • The United States and the ROK share a commitment to promoting an open, interoperable, secure, and reliable Internet. We are exploring opportunities for collaboration on ICT capacity country assessments, joint training, and related cybersecurity capacity-building.
  • The United States and the ROK discussed 5G network security and recognize the importance of ensuring a secure, resilient, and trustworthy 5G ecosystem. The two countries continue to work together to promote international collaboration for enhancing 5G security, including in venues such as the Prague 5G Security Conference which was held in September 2020.
  • During the SED session on emerging technologies, the United States and the ROK also noted potential opportunities for cooperation on artificial intelligence and quantum computing.

Economic Security

  • The United States highlighted the work it is doing to assist the private sector in strengthening, diversifying, and insulating supply chains against future crises, and encouraged further cooperation with the ROK to support companies seeking to relocate, diversify, or make new investments in ways that enhance our supply chain security while helping our businesses and workers to thrive.

Future Work

  • Noting that the Senior Economic Dialogue remains the highest economic platform to discuss bilateral, regional, and global cooperation, both sides committed to continue efforts to make progress in the aforementioned areas through this mechanism.
  • The United States and the ROK expect to participate in the 4th Public-Private Economic Forum, which will be hosted by the Atlantic Council in a virtual format. This event provides a platform for multi-stakeholder dialogue and public exchange on the U.S.-ROK economic relationship, and is held annually alongside the SED. The ROK has proposed to host the 6th SED in Seoul in 2021.

/Public Release. The material in this public release comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.

Latest

WSJ News Exclusive | Justice Department to File Long-Awaited Antitrust Suit Against Google

Mish Boyka

Published

on

The Justice Department will file an antitrust lawsuit Tuesday alleging that Google engaged in anticompetitive conduct to preserve monopolies in search and search-advertising that form the cornerstones of its vast conglomerate, according to senior Justice officials.

The long-anticipated case, expected to be filed in a Washington, D.C., federal court, will mark the most aggressive U.S. legal challenge to a company’s dominance in the tech sector in more than two decades, with the potential to shake up Silicon Valley and beyond. Once a public darling, Google attracted considerable scrutiny over the past decade as it gained power but has avoided a true showdown with the government until now.

The department will allege that Google, a unit of

Alphabet Inc.,

GOOG -2.44%

is maintaining its status as gatekeeper to the internet through an unlawful web of exclusionary and interlocking business agreements that shut out competitors, officials said. The government will allege that Google uses billions of dollars collected from advertisements on its platform to pay mobile-phone manufacturers, carriers and browsers, like

Apple Inc.’s

Safari, to maintain Google as their preset, default search engine.

The upshot is that Google has pole position in search on hundreds of millions of American devices, with little opportunity for any competitor to make inroads, the government will allege.

Justice officials said the lawsuit will also take aim at arrangements in which Google’s search application is preloaded, and can’t be deleted, on mobile phones running its popular Android operating system. The government will allege Google unlawfully prohibits competitors’ search applications from being preloaded on phones under revenue-sharing arrangements, they said.

Google owns or controls search distribution channels accounting for about 80% of search queries in the U.S., the officials said. That means Google’s competitors can’t get a meaningful number of search queries and build a scale needed to compete, leaving consumers with less choice and less innovation, and advertisers with less competitive prices, the lawsuit will allege.

Google didn’t immediately respond to a request for comment, but the company has said its competitive edge comes from offering a product that billions of people choose to use each day.

The Mountain View, Calif., company, sitting on a $120 billion cash hoard, is unlikely to shrink from a legal fight. The company has argued that it faces vigorous competition across its different operations and that its products and platforms help businesses small and large reach new customers.

Google’s defense against critics of all stripes has long been rooted in the fact that its services are largely offered to consumers at little or no cost, undercutting the traditional antitrust argument around potential price harms to those who use a product.

The lawsuit follows a Justice Department investigation that has stretched more than a year, and comes amid a broader examination of the handful of technology companies that play an outsize role in the U.S. economy and the daily lives of most Americans.

A loss for Google could mean court-ordered changes to how it operates parts of its business, potentially creating new openings for rival companies. The Justice Department’s lawsuit won’t specify particular remedies; that is usually addressed later in a case. One Justice official said nothing is off the table, including possibly seeking structural changes to Google’s business.

A victory for Google could deal a huge blow to Washington’s overall scrutiny of big tech companies, potentially hobbling other investigations and enshrining Google’s business model after lawmakers and others challenged its market power. Such an outcome, however, might spur Congress to take legislative action against the company.

The case could take years to resolve, and the responsibility for managing the suit will fall to the appointees of whichever candidate wins the Nov. 3 presidential election.

The challenge marks a new chapter in the history of Google, a company formed in 1998 in a garage in a San Francisco suburb—the same year

Microsoft Corp.

was hit with a blockbuster government antitrust case accusing the software giant of unlawful monopolization. That case, which eventually resulted in a settlement, was the last similar government antitrust case against a major U.S. tech firm.

Google’s billionaire co-founders Sergey Brin, left, and Larry Page, shown in 2008, gave up their management roles but remain in effective control of the company.



Photo:

Paul Sakuma/Associated Press

Google started as a simple search engine with a large and amorphous mission “to organize the world’s information.” But over the past decade or so it has developed into a conglomerate that does far more than that. Its flagship search engine handles more than 90% of global search requests, some billions a day, providing fodder for what has become a vast brokerage of digital advertising. Its YouTube unit is the world’s largest video platform, used by nearly three-quarters of U.S. adults.

Google has been bruised but never visibly hurt by various controversies surrounding privacy and allegedly anticompetitive behavior, and its growth has continued almost entirely unchecked. In 2012, the last time Google faced close antitrust scrutiny in the U.S., the search giant was already one of the largest publicly traded companies in the nation. Since then, its market value has roughly tripled to almost $1 trillion.

The company takes on this legal showdown under a new generation of leadership. Co-founders

Larry Page

and

Sergey Brin

, both billionaires, gave up their management roles last year, handing the reins solely to

Sundar Pichai

, a soft-spoken, India-born engineer who earlier in his career helped present Google’s antitrust complaints about Microsoft to regulators.

The chief executive has in his corner Messrs. Page and Brin, who remain on Alphabet’s board and in effective control of the company thanks to shares that give them, along with former Chief Executive

Eric Schmidt

, disproportionate voting power.

More on Google’s Business

Executives inside Google are quick to portray their divisions as mere startups in areas—like hardware, social networking, cloud computing and health—where other Silicon Valley giants are further ahead. Still, that Google has such breadth at all points to its omnipresence.

European Union regulators have targeted the company with three antitrust complaints and fined it about $9 billion, though the cases haven’t left a big imprint on Google’s businesses there, and critics say the remedies imposed on it have proved underwhelming.

In the U.S., nearly all state attorneys general are separately investigating Google, while three other tech giants—

Facebook Inc.,

Apple and

Amazon.com Inc.

—likewise face close antitrust scrutiny. And in Washington, a bipartisan belief is emerging that the government should do more to police the behavior of top digital platforms that control widely used tools of communication and commerce.

More than 10 state attorneys general are expected to join the Justice Department’s case, officials said. Other states are still considering their own cases related to Google’s search practices, and a large group of states is considering a case challenging Google’s power in the digital advertising market, The Wall Street Journal has reported. In the ad-technology market, Google owns industry-leading tools at every link in the complex chain between online publishers and advertisers.

The Justice Department also continues to investigate Google’s ad-tech practices.

Democrats on a House antitrust subcommittee released a report this month following a 16-month inquiry, saying all four tech giants wield monopoly power and recommending congressional action. The companies’ chief executives testified before the panel in July.

Google CEO Sundar Pichai testified before Congress in July, in hearings where lawmakers pressed tech companies’ leaders on their business practices.



Photo:

Graeme Jennings/Press Pool

Big Tech Under Fire

The Justice Department isn’t alone in scrutinizing tech giants’ market power. These are the other inquiries now under way:

  • Federal Trade Commission: The agency has been examining Facebook’s acquisition strategy, including whether it bought platforms like WhatsApp and Instagram to stifle competition. People following the case believe the FTC is likely to file suit by the end of the year.
  • State attorneys general: A group of state AGs led by Texas is investigating Google’s online advertising business and expected to file a separate antitrust case. Another group of AGs is reviewing Google’s search business. Still another, led by New York, is probing Facebook over antitrust concerns.
  • Congress: After a lengthy investigation, House Democrats found that Amazon holds monopoly powers over its third-party sellers and that Apple exerts monopoly power through its App Store. Those findings and others targeting Facebook and Google could trigger legislation. Senate Republicans are separately moving to limit Section 230 of the Communications Decency Act, which gives online platforms a liability shield, saying the companies censor conservative views.
  • Federal Communications Commission: The agency is reviewing a Trump administration request to reinterpret key parts of Section 230, for the same reasons cited by GOP senators. Tech companies are expected to challenge possible action on free-speech grounds.

“It’s Google’s business model that is the problem,”

Rep. David Cicilline

(D., R.I.), the subcommittee chairman, told Mr. Pichai. “Google evolved from a turnstile to the rest of the web to a walled garden that increasingly keeps users within its sights.”

“We see vigorous competition,” Mr. Pichai responded, pointing to travel search sites and product searches on Amazon’s online marketplace. “We are working hard, focused on the users, to innovate.”

Amid the criticism, Google and other tech giants remain broadly popular and have only gained in might and stature since the start of the coronavirus pandemic, buoying the U.S. economy—and stock market—during a period of deep uncertainty.

At the same time, Google’s growth across a range of business lines over the years has expanded its pool of critics, with companies that compete with the search giant, as well as some Google customers, complaining about its tactics.

Specialized search providers like

Yelp Inc.

and

Tripadvisor Inc.

have long voiced such concerns to U.S. antitrust authorities, and newer upstarts like search-engine provider DuckDuckGo have spent time talking to the Justice Department.

News Corp,

owner of The Wall Street Journal, has complained to antitrust authorities at home and abroad about both Google’s search practices and its dominance in digital advertising.

Some Big Tech detractors have called to break up Google and other dominant companies. Courts have indicated such broad action should be a last resort available only if the government clears high legal hurdles, including by showing that lesser remedies are inadequate.

The outcome could have a considerable impact on the direction of U.S. antitrust law. The Sherman Act that prohibits restraints of trade and attempted monopolization is broadly worded, leaving courts wide latitude to interpret its parameters. Because litigated antitrust cases are rare, any one ruling could affect governing precedent for future cases.

Google’s growth across a range of business lines has expanded its pool of critics. The company exhibited at the CES 2020 electronics show in Las Vegas on Jan. 8.



Photo:

Mario Tama/Getty Images

The tech sector has been a particular challenge for antitrust enforcers and the courts because the industry evolves rapidly and many products and services are offered free to consumers, who in a sense pay with the valuable personal data companies such as Google collect.

The search company famously outmaneuvered the Federal Trade Commission nearly a decade ago.

The FTC, which shares antitrust authority with the Justice Department, spent more than a year investigating Google but decided in early 2013 not to bring a case in response to complaints that the company engaged in “search bias” by favoring its own services and demoting rivals. Competition staff at the agency deemed the matter a close call, but said a case challenging Google’s search practices could be tough to win because of what they described as mixed motives within the company: a desire to both hobble rivals and advance quality products and services for consumers.

The Justice Department’s case won’t focus on a search-bias theory, Justice officials said.

Google made a handful of voluntary commitments to address other FTC concerns, a resolution that was widely panned by advocates of stronger antitrust enforcement and continues to be cited as a top failure. Google’s supporters say the FTC’s light touch was appropriate and didn’t burden the company as it continued to grow.

The Department of Justice is investigating the U.S.’s largest tech firms for allegedly monopolistic behavior. Roughly 20 years ago, a similar case threatened to destabilize Microsoft. WSJ explains. (Originally published Sept. 5, 2019)

The Justice Department’s current antitrust chief, Makan Delrahim, spent months negotiating with the FTC last year for jurisdiction to investigate Google this time around. He later recused himself in the case—Google was briefly a client years before while he was in private practice—as the department’s top brass moved to take charge.

The Justice Department lawsuit comes after internal tensions, with some staffers skeptical of Attorney General

William Barr

’s push to bring a case as quickly as possible, the Journal has reported. The reluctant staffers worried the department hadn’t yet built an airtight case and feared rushing to litigation could lead to a loss in court. They also worried Mr. Barr was driven by an interest in filing a case before the election. Others were more comfortable moving ahead.

Mr. Barr has pushed the department to move forward under the belief that antitrust enforcers have been too slow and hesitant to take action, according to a person familiar with his thinking. He has taken an unusually hands-on role in several areas of the department’s work and repeatedly voiced interest in investigating tech-company dominance.

Attorney General William Barr has pushed to bring an antitrust case quickly against Google, in some cases taking an unusually hands-on role in preparations.



Photo:

matt mcclain/press pool

If the Microsoft case from 20 years ago is any guide, Mr. Barr’s concern with speed could run up against the often slow pace of litigation.

After a circuitous route through the court system, including one initial trial-court ruling that ordered a breakup, Microsoft reached a 2002 settlement with the government and changed some aspects of its commercial behavior but stayed intact. It remained under court supervision and subject to terms of its consent decree with the government until 2011.

Antitrust experts have long debated whether the settlement was tough enough on Microsoft, though most observers believe the agreement opened up space for a new generation of competitors.

Write to Brent Kendall at brent.kendall@wsj.com and Rob Copeland at rob.copeland@wsj.com

Copyright ©2020 Dow Jones & Company, Inc. All Rights Reserved. 87990cbe856818d5eddac44c7b1cdeb8

Continue Reading

Latest

You Reap What You Code

Mish Boyka

Published

on

 

2020/10/20

You Reap What You Code

 

This is a loose transcript of my talk at Deserted Island DevOps Summer Send-Off, an online conference in COVID-19 times. One really special thing about it is that the whole conference takes place over the Animal Crossing video game, with quite an interesting setup.

It was the last such session of the season, and I was invited to present with few demands. I decided to make a compressed version of a talk I had been mulling over for close to a year, and had lined up for at least one in-person conference that got cancelled/reported in April and had given in its fill hour-long length internally at work. The final result is a condensed 30 minutes that touches all kinds of topics, some of which have been borrowed from previous talks and blog posts of mine.

If I really wanted to, I could probably make one shorter blog post out of every one or two slides in there, but I decided to go for coverage rather than depth. Here goes nothing.

'You Reap What You Code': shows my character in-game sitting at a computer with a bunch of broken parts around, dug from holes in the ground

So today I wanted to give a talk on this tendency we have as software developers and engineers to write code and deploy things that end up being a huge pain to live with, to an extent we hadn’t planned for.

In software, a pleasant surprise is writing for an hour without compiling once and then it works; a nasty surprise is software that seems to work and after 6 months you find out it poisoned your life.

This presentation is going to be a high level thing, and I want to warn you that I’m going to go through some philosophical concerns at first, follow that up with research that has taken place in human factors and cognitive science, and tie that up with broad advice that I think could be useful to everyone when it comes to system thinking and designing things. A lot of this may feel a bit out there, but I hope that by the end it’ll feel useful to you

'Power and Equity; Ivan Illich' shows a screenshot of the game with a little village-style view

This is the really philosophical stuff we’re starting with. Ivan Illich was a wild ass philosopher who hated things like modern medicine and mandatory education. He wrote this essay called “Power and Equity” (to which I was introduced by reading a Stephen Krell presentation), where he decides to also dislike all sorts of motorized transportation.

Ivan Illiches introduces the concept of an “oppressive” monopoly; if we look at societies that developed for foot traffic and cycling, you can generally use any means of transportation whatsoever and effectively manage to live and thrive there. Whether you live in a tent or a mansion, you can get around the same.

He pointed out that cycling was innately fair because it does not require more energy than what is required as a baseline to operate: if you can walk, you can cycle, and cycling, for the same energy as walking, is incredibly more efficient. Cars don’t have that; they are rather expensive, and require disproportionate amounts of energy compared to what a basic person has.

His suggestion was that all non-freight transport, whether cars or busses and trains, be capped to a fixed percentage above the average speed of a cyclist, which is based on the power a normal human body can produce on its own. He suggested we do this to prevent…

Aerial stock photo of an American suburb

that!

We easily conceived cars as ways to make existing burdens easier: it created freedoms, widened our access to goods and people. It was a better horse, and a less exhausting bicycle. And so society would develop to embrace cars in its infrastructure.

Rather than having a merchant bring goods to the town square, the milkman drop milk on the porch, and markets smaller and distributed closer to where they’d be convenient, it is now everyone’s job to drive for each of these things while stores go to where land is cheap rather than where people are. And when society develops with a car in mind, you now need a car to be functional.

In short the cost of participating in society has gone up, and that’s what an oppressive monopoly is.

'The Software Society': Van Bentum's painting The Explosion in the Alchemist's Laboratory

To me, the key thing that Illich did was twist the question another way: what effects would cars have on society if a majority of people had them, and what effect would it have on the rest of us?

The question I now want to ask is whether we have the equivalent in the software world. What are the things we do that we perceive increase our ability to do things, but turn out to actually end up costing us a lot more to just participate?

We kind of see it with our ability to use all the bandwidth a user may have; trying to use old dial-up connections is flat out unworkable these days. But do we have the same with our cognitive cost? The tooling, the documentation, the procedures?

'Ecosystems; we share a feedback loop': a picture of an in-game aquarium within the game's museum

I don’t have a clear answer to any of this, but it’s a question I ask myself a lot when designing tools and software.

The key point is that the software and practices that we choose to use is not just something we do in a vacuum, but part of an ecosystem; whatever we add to it changes and shifts expectations in ways that are out of our control, and impacts us back again. The software isn’t trapped with us, we’re trapped with the software.

Are we not ultimately just making our life worse for it? I want to focus on this part where we make our own life, as developers, worse. When we write or adopt software to help ourselves but end up harming ourselves in the process, because that speaks to our own sustainability.

'Ironies of automation; (Bainbridge, 1983): A still from Fantasia's broom scene

Now we’re entering the cognitive science and human factors bit.

Rather than just being philosophical here I want to ground things in the real world with practical effects. Because this is something that researchers have covered. The Ironies of automation are part of cognitive research (Bainbridge, 1983) that looked into people automating tasks and finding out that the effects weren’t as good as expected.

Mainly, it’s attention and practice clashing. There are tons of examples over the years, but let’s take a look at a modern one with self-driving cars.

Self-driving cars are a fantastic case of clumsy automation. What most established players in the car industry are doing is lane tracking, blind spot detection, and handling parallel parking.

But high tech companies (Tesla, Waymo, Uber) are working towards full self-driving, with Tesla’s autopilot being the most ambitious one being released to the public at large. But all of these right now operate in ways Bainbridge fully predicted in 1983:

  • the driver is no longer actively involved and is shifted to the role of monitoring
  • the driver, despite no longer driving the car, regardless must be fully aware of everything the car is doing
  • when the car gets in a weird situation, it is expected that the driver takes control again
  • so the car handles all the easy cases, but all the hard cases are left to the driver

Part of the risk there is twofold: people have limited attention for tasks they are not involved in—if you’re not actively driving it’s going to be hard to be attentive for extended periods of time—and if you’re only driving rarely with only the worst cases, you risk being out of practice to handle the worst cases.

Such automation is done in airlines who otherwise make up for it in simulator hours, and still manually handling planned difficult areas like takeoff and landing. Still, a bunch of airline incidents discover that this hand-off is often complex and not going well.

Clearly, when we ignore the human component and its responsibilities in things, we might make software worse than what it would have been.

'HABA-MABA problems': a chart illustrating Fitt's model using in-game images

In general most of these errors come from the following point of view. This is called the “Fitts” model, also “HABA-MABA”, for “Humans are better at, machines are better at” (the original version was referred as MABA-MABA, using “Men” rather than “Humans”). This model frames humans as slow, perceptive beings able of judgement, and machines are fast undiscerning indefatigable things.

We hear this a whole lot even today. These things are, to be polite, a beginner’s approach to automation design. It’s based on scientifically outdated concepts, intuitive-but-wrong sentiments, and is comforting in letting you think that only the predicted results will happen and totally ignores any emergent behaviour. It operates on what we think we see now, not on stronger underlying principles, and often has strong limitations when it comes to being applied in practice.

It is disconnected from the reality of human-machine interactions, and frames choices as binary when they aren’t, usually with the intent of pushing the human out of the equation when you shouldn’t. This is, in short, a significant factor behind the ironies of automation.

'Joint Cognitive Systems': a chart illustrating the re-framing of computers as teammates

Here’s a patched version established by cognitive experts. They instead reframe the human-computer relationship as a “joint cognitive system”, meaning that instead of thinking of humans and machines as unrelated things that must be used in distinct contexts for specific tasks, we should frame humans and computers as teammates working together. This, in a nutshell, shifts the discourse from how one is limited to terms of how one can complement the other.

Teammates do things like being predictable to each other, sharing a context and language, being able to notice when their actions may impact others and adjust accordingly, communicate to establish common ground, and have an idea of everyone’s personal and shared objectives to be able to help or prioritize properly.

Of course we must acknowledge that we’re nowhere close to computers being teammates as the state of the art today. And since currently computers need us to keep realigning them all the time, we have to admit that the system is not just the code and the computers, it’s the code, the computers, and all the people who interact with them and each other. And if we want our software to help us, we need to be able to help it, and to help it that means the software needs to be built knowing it will be full of limitations and having us work to make it easier to diagnose issues and form and improve mental models.

So the question is: what makes a good model? How can we help people work with what we create?

'How People From Models': a detailed road map of the city of London, UK

note: this slide and the next one are taken from my talk on operable software

This is a map of the city of London, UK. It is not the city of London, just a representation of it. It’s very accurate: it has streets with their names, traffic directions, building names, rivers, train stations, metro stations, footbridges, piers, parks, gives details regarding scale, distance, and so on. But it is not the city of London itself: it does not show traffic nor roadwork, it does not show people living there, and it won’t tell you where the good restaurants are. It is a limited model, and probably an outdated one.

But even if it’s really limited, it is very detailed. Detailed enough that pretty much anyone out there can’t fit it all in their head. Most people will have some detailed knowledge of some parts of it, like the zoomed-in square in the image, but pretty much nobody will just know the whole of it in all dimensions.

In short, pretty much everyone in your system only works from partial, incomplete, and often inaccurate and outdated data, which itself is only an abstract representation of what goes on in the system. In fact, what we work with might be more similar to this:

A cartoony tourist map of London's main attractions

That’s more like it. This is still not the city of London, but this tourist map of London is closer to what we work with. Take a look at your architecture diagrams (if you have them), and chances are they look more like this map than the very detailed map of London. This map has most stuff a tourist would want to look at: important buildings, main arteries to get there, and some path that suggests how to navigate them. The map has no well-defined scale, and I’m pretty sure that the two giant people on Borough road won’t fit inside Big Ben. There are also lots of undefined areas, but you will probably supplement them with other sources.

But that’s alright, because mental models are as good as their predictive power; if they let you make a decision or accomplish a task correctly, they’re useful. And our minds are kind of clever in that they only build models as complex as they need to be. If I’m a tourist looking for my way between main attractions, this map is probably far more useful than the other one.

There’s a fun saying about this: “Something does not exist until it is broken.” Subjectively, you can be entirely content operating a system for a long time without ever knowing about entire aspects of it. It’s when they start breaking or that your predictions about the system no longer works that you have to go back and re-tune your mental models. And since this is all very subjective, everyone has different models.

This is a vague answer to what is a good model, and the follow up is how can we create and maintain them?

'Syncing Models': a still from the video game in the feature where you back up your island by uploading it online

One simple step, outside of all technical components, is to challenge and help each other to sync and build better mental models. We can’t easily transfer our own models to each other, and in fact it’s pretty much impossible to control them. What we can do is challenge them to make sure they haven’t eroded too much, and try things to make sure they’re still accurate, because things change with time.

So in a corporation, things we might do include training, documentation, incident investigations all help surface aspects and changes to our systems to everyone. Game days and chaos engineering are also excellent ways to discover how our models might be broken in a controlled setting.

They’re definitely things we should do and care about, particularly at an organisational level. That being said, I want to focus a bit more on the technical stuff we can do as individuals.

'Layering Observability': a drawing of abstraction layers and observation probes' locations

note: this slide is explored more in depth in my talk on operable software

We can’t just open a so-called glass pane and see everything at once. That’s too much noise, too much information, too little structure. Seeing everything is only useful to the person who knows what to filter in and filter out. You can’t easily form a mental model of everything at once. To aid model formation, we should structure observability to tell a story.

Most applications and components you use that are easy to operate do not expose their internals to you, they mainly aim to provide visibility into your interactions with them. There has to be a connection between the things that the users are doing and the impact it has in or on the system, and you will want to establish that. This means:

  • Provide visibility into interactions between components, not their internals
  • log at the layer below which you want to debug, which saves time and how many observability probes you need to insert in your code base. We have a tendency to stick everything at the app level, but that’s misguided.
  • This means the logs around a given endpoint have to be about the user interactions with that endpoint, and require no knowledge of its implementation details
  • For developer logs, you can have one log statement shared by all the controllers by inserting it a layer below endpoints within the framework, rather than having to insert one for each endpoint.
  • These interactions will let people make a mental picture of what should be going on and spot where expectations are broken more easily. By layering views, you then make it possible to skip between layers according to which expectations are broken and how much knowledge they have
  • Where a layer provides no easy observability, people must cope through inferences in the layers above and below it. It becomes a sort of obstacle.

Often we are stuck with only observability at the highest level (the app) or the lowest level (the operating system), with nearly nothing useful in-between. We have a blackbox sandwich where we can only look at some parts, and that can be a consequence of the tools we choose. You’ll want to actually pick runtimes and languages and frameworks and infra that let you tell that observability story and properly layer it.

'Logging Practices': a game character chopping down trees

Another thing to help with model formation is maintaining that relationship between humans and machines going smoothly. This is a trust relationship, and providing information that is considered misleading or unhelpful erodes that trust. There are a few things you can do with logs that can help not ruin your marriage to the computer.

The main one is to log facts, not interpretations. You often do not have all the context from within a single log line, just a tiny part of it. If you start trying to be helpful and suggesting things to people, you change what is a fact-gathering expedition into a murder-mystery investigations where bits of the system can’t be trusted or you have to rean between the lines. That’s not helpful. A log line that says TLS validation error: SEC_ERROR_UNKNOWN_ISSUER is much better than one that says ERROR: you are being hacked regardless of how much experience you have.

A thing that helps with that is structured logging, which is better than regular text. It makes it easier for people to use scripts or programs to parse, aggregate, route, and transform logs. It prevents you from needing full-text search to figure out what happened. If you really want to provide human readable text or interpretations, add it to a field within structured logging.

Finally, adopting consistent naming mechanisms and units is always going to prove useful.

'Hitting Limits': the game's museum's owl being surprised while woken up

There is another thing called the Law of Requisite Variety, which says that only complexity can control complexity. If an agent can’t represent all the possible states and circumstances around a thing it tries to control, it won’t be able to control it all. Think of an airplane’s flight stabilizers; they’re able to cope only with a limited amount of adjustment, and usually at a higher rate than we humans could. Unfortunately, once it reaches a certain limit in its actions and things it can perceive, it stops working well.

That’s when control is either ineffective, or passed on to the next best things. In the case of software we run and operate, that’s us, we’re the next best thing. And here we fall into the old idea that if you are as clever as you can to write something, you’re in trouble because you need to be doubly as clever to debug it.

That’s because to debug a system that is misbehaving under automation, you need to understand the system, and then understand the automation, then understand what the automation thinks of the system, and then take action.

That’s always kind of problematic, but essentially, brittle automation forces you to know more than if you had no automation in order to make things work in difficult times. Things can then become worse than if you had no automation in the first place.

'Handle Hand-Offs First': this in-game owl/museum curator accepting a bug he despises for his collection

When you start creating a solution, do it while being aware that it is possibly going to be brittle and will require handing control over to a human being. Focus on the path where the automation fails and how the hand-off will take place. How are you going to communicate that, and which clues or actions will an operator have to take over things?

When we accept and assume that automation will reach its limits, and the thing that it does is ask a human for help, we shift our approach to automation. Make that hand-off path work easily. Make it friendly, and make it possible for the human to understand what the state of automation was at a given point in time so you can figure out what it was doing and how to work around it. Make it possible to guide the automation into doing the right thing.

Once you’ve found your way around that, you can then progressively automate things, grow the solution, and stay in line with these requirements. It’s a backstop for bad experiences, similar to “let it crash” for your code, so doing it well is key.
:

'Curb Cut Effect': a sidewalk with the classic curb cut in it

Another thing that I think is interesting is the curb cut effect. The curb cut effect was noticed as a result from the various American laws about accessibilities that started in the 60s. The idea is that to make sidewalks and streets accessible to people in wheelchairs, you would cut the part of the curb so that it would create a ramp from sidewalk to street.

The thing that people noticed is that even though you’d cut the curb for handicapped people, getting around was now easier for people carrying luggage, pushing strollers, on skateboards or bicycles, and so on. Some studies saw that people without handicaps would even deviate from their course to use the curb cuts.

Similar effects are found when you think of something like subtitles which were put in place for people with hearing problems. When you look at the raw number of users today, there are probably more students using them to learn a second or third language than people using them with actual hearing disabilities. Automatic doors that open when you step in front of them are also very useful for people carrying loads of any kind, and are a common example of doing accessibility without “dumbing things down.”

I’m mentioning all of this because I think that keeping accessibility in mind when building things is one of the ways we can turn nasty negative surprises into pleasant emerging behaviour. And generally, accessibility is easier to build in than to retrofit. In the case of the web, accessibility also lines up with better performance.

If you think about diversity in broader terms, how would you rethink your dashboards and monitoring and on-call experience if you were to run it 100% on a smartphone? What would that let people on regular computers do that they cannot today? Ask the same question but with user bases that have drastically different levels of expertise.

I worked with an engineer who used to work in a power station and the thing they had set up was that during the night, when they were running a short shift, they’d generate an audio file that contained all the monitoring metrics. They turned it into a sort of song, and engineers coming in in the morning would listen to it on fast forward to look for anomalies.

Looking at these things can be useful. If you prepare for your users of dashboards to be colorblind, would customizing colors be useful? And could that open up new regular use cases to annotate metrics that tend to look weird and for which you want to keep an eye on?

And so software shouldn’t be about doing more with less. It’s actually requiring less to do more. As in letting other people do more with less.

'Complexity Has To Live Somewhere': in-game's 'The Thinker' sitting at a desk, looking like it's pondering at papers

note: this slide is a short version of my post on Complexity Has to Live Somewhere

A thing we try to do, especially as software engineers, is to try to keep the code and the system—the technical part of the system—as simple as possible. We tend to do that by finding underlying concepts, creating abstractions, and moving things outside of the code. Often that means we rely on some sort of convention.

When that happens, what really goes on is that the complexity of how you chose to solve a problem still lingers around. Someone has to handle the thing. If you don’t, your users have to do it. And if it’s not in the code, it’s in your operators or the people understanding the code. Because if the code is to remain simple, the difficult concepts you abstracted away still need to be understood and present in the world that surrounds the code.

I find it important to keep that in mind. There’s this kind of fixed amount of complexity that moves around the organization, both in code and in the knowledge your people have.

Think of how people interact with the features day to day. What do they do, how does it impact them? What about the network of people around them? How do they react to that? Would you approach software differently if you think that it’s still going to be around in 5, 10, or 20 years when you and everyone who wrote it has left? If so, would that approach help people who join in just a few months?

One of the things I like to think about is that instead of using military analogies of fights and battles, it’s interesting to frame it in terms of gardens or agriculture. When we frame the discussion that we have in terms of an ecosystem and the people working collectively within it, the way we approach solving problems can also change drastically.

'Replacing, Adding, or Diffusing?': the trolley problem re-enacted with in-game items

Finally, one of the things I want to mention briefly is this little thought framework I like when we’re adopting new technology.

One we first adopt a new piece of technology, the thing we try to do—or tend to do—is to start with the easy systems first. Then we say “oh that’s great! That’s going to replace everything we have.” Eventually, we try to migrate everything, but it doesn’t always work.

So an approach that makes sense is to start with the easy stuff to probe that it’s workable for the basic cases. But also try something really, really hard, because that would be the endpoint. The endgame is to migrate the hardest thing that you’ve got.

If you’re not able to replace everything, consider framing things as adding it to your system rather than replacing. It’s something you add to your stack. This framing is going to change the approach you have in terms of teaching, maintenance, and in terms of pretty much everything that you have to care about so you avoid the common trap of deprecating a piece of critical technology with nothing to replace it. If you can replace a piece of technology then do it, but if you can’t, don’t fool yourself. Assume the cost of keeping things going.

The third one there is diffusing. I think diffusing is something we do implicitly when we do DevOps. We took the Ops responsibilities and the Dev responsibilities and instead of having it in different areas and small experts in dev and operation, you end up making it everybody’s responsibility to be aware of all aspects.

That creates that diffusion where in this case, it can be positive. You want everyone to be handling a task. But if you look at the way some organisations are handling containerization, it can be a bunch of operations people who no longer have to care about that aspect of their job. Then all of the development teams now have to know and understand how containers work, how to deploy them, and just adapt their workflow accordingly.

In such a case we haven’t necessarily replaced or removed any of the needs for deployment. We’ve just taken it outside of the bottleneck and diffused it and sent it to everyone else.

I think having an easy way, early in the process, to figure out whether what we’re doing is replacing, adding, or diffusing things will drastically influence how we approach change at an organisational level. I think it can be helpful.

'Thanks': title slide again

This is all I have for today. Hopefully it was practical.

Thanks!

 

Continue Reading

Latest

The Surprising Impact of Medium-Size Texts on PostgreSQL Performance

Mish Boyka

Published

on

 


Any database schema is likely to have plenty of text fields. In this article, I divide text fields into three categories:

  1. Small texts: names, slugs, usernames, emails, etc. These are text fields that usually have some low size limit, maybe even using varchar(n) and not text.
  2. Large texts: blog post content, articles, HTML content etc. These are large pieces of free, unrestricted text that is stored in the database.
  3. Medium texts: descriptions, comments, product reviews, stack traces etc. These are any text field that is between the small and the large. These type of texts would normally be unrestricted, but naturally smaller than the large texts.

In this article I demonstrate the surprising impact of medium-size texts on query performance in PostgreSQL.

Sliced bread... it gets better<br><small>Photo by <a href="https://unsplash.com/photos/WHJTaLqonkU">Louise Lyshøj</a></small>
Sliced bread… it gets better
Photo by Louise Lyshøj
Table of Contents

When talking about large chunks of text, or any other field that may contain large amounts of data, we first need to understand how the database handles the data. Intuitively, you might think that the database is storing large pieces of data inline like it does smaller pieces of data, but in fact, it does not:

PostgreSQL uses a fixed page size (commonly 8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly.

As the documentation explains, PostgreSQL can’t store rows (tuples) in multiple pages. So how does the database store large chunks of data?

[…] large field values are compressed and/or broken up into multiple physical rows. […] The technique is affectionately known as TOAST (or “the best thing since sliced bread”).

OK, so how is this TOAST working exactly?

If any of the columns of a table are TOAST-able, the table will have an associated TOAST table

So TOAST is a separate table associated with our table. It is used to store large pieces of data of TOAST-able columns (the text datatype for example, is TOAST-able).

What constitutes a large value?

The TOAST management code is triggered only when a row value to be stored in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will compress and/or move field values out-of-line until the row value is shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains can be had

PostgreSQL will try to compress a the large values in the row, and if the row can’t fit within the limit, the values will be stored out-of-line in the TOAST table.

Finding the TOAST

Now that we have some understanding of what TOAST is, let’s see it in action. First, create a table with a text field:

db=# CREATE TABLE toast_test (id SERIAL, value TEXT);
CREATE TABLE

The table contains an id column, and a value field of type TEXT. Notice that we did not change any of the default storage parameters.

The text field we added supports TOAST, or is TOAST-able, so PostgreSQL should create a TOAST table. Let’s try to locate the TOAST table associated with the table toast_test in pg_class:

db=# SELECT relname, reltoastrelid FROM pg_class WHERE relname = 'toast_test';
  relname   │ reltoastrelid
────────────┼───────────────
 toast_test │        340488

db=# SELECT relname FROM pg_class WHERE oid = 340488;
     relname
─────────────────
 pg_toast_340484

As promised, PostgreSQL created a TOAST table called pg_toast_340484.

TOAST in Action

Let’s see what the TOAST table looks like:

db=# d pg_toast.pg_toast_340484
TOAST table "pg_toast.pg_toast_340484"
   Column   │  Type
────────────┼─────────
 chunk_id   │ oid
 chunk_seq  │ integer
 chunk_data │ bytea

The TOAST table contains three columns:

  • chunk_id: A reference to a toasted value.
  • chunk_seq: A sequence within the chunk.
  • chunk_data: The actual chunk data.

Similar to “regular” tables, the TOAST table also has the same restrictions on inline values. To overcome this restriction, large values are split into chunks that can fit within the limit.

At this point the table is empty:

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

This makes sense because we did not insert any data yet. So next, insert a small value into the table:

db=# INSERT INTO toast_test (value) VALUES ('small value');
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

After inserting the small value into the table, the TOAST table remained empty. This means the small value was small enough to be stored inline, and there was no need to move it out-of-line to the TOAST table.

1″small value”idvalue
Small text stored inline

Let’s insert a large value and see what happens:

db=# INSERT INTO toast_test (value) VALUES ('n0cfPGZOCwzbHSMRaX8 ... WVIlRkylYishNyXf');
INSERT 0 1

I shortened the value for brevity, but that’s a random string with 4096 characters. Let’s see what the TOAST table stores now:

db=# SELECT * FROM pg_toast.pg_toast_340484;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼──────────────────────
   995899 │         0 │ x30636650475a4f43...
   995899 │         1 │ x50714c3756303567...
   995899 │         2 │ x6c78426358574534...
(3 rows)

The large value is stored out-of-line in the TOAST table. Because the value was too large to fit inline in a single row, PostgreSQL split it into three chunks. The x3063... notation is how psql displays binary data.

1″small value”2213x…..x…..x…..idvalue
Large text stored out-of-line, in the associated TOAST table

Finally, execute the following query to summarize the data in the TOAST table:

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;
 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
   995899 │      3 │ 4096 bytes
(1 row)

As we’ve already seen, the text is stored in three chunks.

size of database objects

There are several ways to get the size of database objects in PostgreSQL:

  • pg_table_size: Get the size of the table including TOAST, but excluding indexes
  • pg_relation_size: Get the size of just the table
  • pg_total_relation_size: Get the size of the table, including indexes and TOAST

Another useful function is pg_size_pretty: used to display sizes in a friendly format.

TOAST Compression

So far I refrained from categorizing texts by their size. The reason for that is that the size of the text itself does not matter, what matters is its size after compression.

To create long strings for testing, we’ll implement a function to generate random strings at a given length:

CREATE OR REPLACE FUNCTION generate_random_string(
  length INTEGER,
  characters TEXT default '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
) RETURNS TEXT AS
$$
DECLARE
  result TEXT := '';
BEGIN
  IF length < 1 then
      RAISE EXCEPTION 'Invalid length';
  END IF;
  FOR __ IN 1..length LOOP
    result := result || substr(characters, floor(random() * length(characters))::int + 1, 1);
  end loop;
  RETURN result;
END;
$$ LANGUAGE plpgsql;

Generate a string made out of 10 random characters:

db=# SELECT generate_random_string(10);
 generate_random_string
────────────────────────
 o0QsrMYRvp

We can also provide a set of characters to generate the random string from. For example, generate a string made of 10 random digits:

db=# SELECT generate_random_string(10, '1234567890');
 generate_random_string
────────────────────────
 4519991669

PostgreSQL TOAST uses the LZ family of compression techniques. Compression algorithms usually work by identifying and eliminating repetition in the value. A long string containing fewer characters should compress very well compared to a string made of many different characters when encoded into bytes.

To illustrate how TOAST uses compression, we’ll clean out the toast_test table, and insert a random string made of many possible characters:

db=# TRUNCATE toast_test;
TRUNCATE TABLE

db=# INSERT INTO toast_test (value) VALUES (generate_random_string(1024 * 10));
INSERT 0 1

We inserted a 10kb value made of random characters. Let’s check the TOAST table:

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB

The value is stored out-of-line in the TOAST table, and we can see it is not compressed.

Next, insert a value with a similar length, but made out of fewer possible characters:

db=# INSERT INTO toast_test (value) VALUES (generate_random_string(1024 * 10, '123'));
INSERT 0 1

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB
  1495961 │      2 │ 3067 bytes

We inserted a 10K value, but this time it only contained 3 possible digits: 1, 2 and 3. This text is more likely to contain repeating binary patterns, and should compress better than the previous value. Looking at the TOAST, we can see PostgreSQL compressed the value to ~3kB, which is a third of the size of the uncompressed value. Not a bad compression rate!

Finally, insert a 10K long string made of a single digit:

db=# insert into toast_test (value) values (generate_random_string(1024 * 10, '0'));
INSERT 0 1

db=# SELECT chunk_id, COUNT(*) as chunks, pg_size_pretty(sum(octet_length(chunk_data)::bigint))
FROM pg_toast.pg_toast_340484 GROUP BY 1 ORDER BY 1;

 chunk_id │ chunks │ pg_size_pretty
──────────┼────────┼────────────────
  1495960 │      6 │ 10 kB
  1495961 │      2 │ 3067 bytes

The string was compressed so well, that the database was able to store it in-line.

Configuring TOAST

If you are interested in configuring TOAST for a table you can do that by setting storage parameters at CREATE TABLE or ALTER TABLE ... SET STORAGE. The relevant parameters are:

  • toast_tuple_target: The minimum tuple length after which PostgreSQL tries to move long values to TOAST.
  • storage: The TOAST strategy. PostgreSQL supports 4 different TOAST strategies. The default is EXTENDED, which means PostgreSQL will try to compress the value and store it out-of-line.

I personally never had to change the default TOAST storage parameters.


To understand the effect of different text sizes and out-of-line storage on performance, we’ll create three tables, one for each type of text:

db=# CREATE TABLE toast_test_small (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_medium (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_large (id SERIAL, value TEXT);
CREATE TABLE

Like in the previous section, for each table PostgreSQL created a TOAST table:

SELECT
    c1.relname,
    c2.relname AS toast_relname
FROM
    pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE
    c1.relname LIKE 'toast_test%'
    AND c1.relkind = 'r';

      relname      │  toast_relname
───────────────────┼─────────────────
 toast_test_small  │ pg_toast_471571
 toast_test_medium │ pg_toast_471580
 toast_test_large  │ pg_toast_471589

Set Up Test Data

First, let’s populate toast_test_small with 500K rows containing a small text that can be stored inline:

db=# INSERT INTO toast_test_small (value)
SELECT 'small value' FROM generate_series(1, 500000);
INSERT 0 500000

Next, populate the toast_test_medium with 500K rows containing texts that are at the border of being stored out-of-line, but still small enough to be stored inline:

db=# WITH str AS (SELECT generate_random_string(1800) AS value)
INSERT INTO toast_test_medium (value)
SELECT value
FROM generate_series(1, 500000), str;
INSERT 0 500000

I experimented with different values until I got a value just large enough to be stored out-of-line. The trick is to find a string which is roughly 2K that compresses very poorly.

Next, insert 500K rows with large texts to toast_test_large:

db=# WITH str AS (SELECT generate_random_string(4096) AS value)
INSERT INTO toast_test_large (value)
SELECT value
FROM generate_series(1, 500000), str;
INSERT 0 500000

We are now ready for the next step.

Comparing Performance

We usually expect queries on large tables to be slower than queries on smaller tables. In this case, it’s not unreasonable to expect the query on the small tables to run faster than on the medium table, and a query on the medium table to be faster than the same query on the large table.

To compare performance, we are going to execute a simple query to fetch one row from the table. Since we don’t have an index, the database is going to perform a full table scan. We’ll also disable parallel query execution to get a clean, simple timing, and execute the query multiple times to account for caching.

db=# SET max_parallel_workers_per_gather = 0;
SET

Starting with the small table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
                                    QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────
 Gather  (cost=1000.00..7379.57 rows=1 width=16)
   ->  Parallel Seq Scan on toast_test_small  (cost=0.00..6379.47 rows=1 width=16)
        Filter: (id = 6000)
        Rows Removed by Filter: 250000
 Execution Time: 31.323 ms
(8 rows)

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
Execution Time: 25.865 ms

I ran the query multiple times and trimmed the output for brevity. As expected the database performed a full table scan, and the timing finally settled on ~25ms.

Next, execute the same query on the medium table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
Execution Time: 321.965 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
Execution Time: 173.058 ms

Running the exact same query on the medium table took significantly more time, 173ms, which is roughly 6x slower than on the smaller table. This makes sense.

To complete the test, run the query again on the large table:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
Execution Time: 49.867 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
Execution Time: 37.291 ms

Well, this is surprising! The timing of the query on the large table is similar to the timing of the small table, and 6 times faster than the medium table.

Table Timing
toast_test_small 31.323 ms
toast_test_medium 173.058 ms
toast_test_large 37.291 ms

Large tables are supposed to be slower, so what is going on?

Making Sense of the Results

To make sense of the results, have a look at the size of each table, and the size of its associated TOAST table:

SELECT
    c1.relname,
    pg_size_pretty(pg_relation_size(c1.relname::regclass)) AS size,
    c2.relname AS toast_relname,
    pg_size_pretty(pg_relation_size(('pg_toast.' || c2.relname)::regclass)) AS toast_size
FROM
    pg_class c1
    JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE
    c1.relname LIKE 'toast_test_%'
    AND c1.relkind = 'r';
relname size toast_relname toast_size
toast_test_small 21 MB pg_toast_471571 0 bytes
toast_test_medium 977 MB pg_toast_471580 0 bytes
toast_test_large 25 MB pg_toast_471589 1953 MB

Let’s break it down:

  • toast_test_small: The size of the table is 21MB, and there is no TOAST. This makes sense because the texts we inserted to that table were small enough to be stored inline.
1…..2idvalue500K……….
Small texts stored inline
  • toast_test_medium: The table is significantly larger, 977MB. We inserted text values that were just small enough to be stored inline. As a result, the table got very big, and the TOAST was not used at all.
1………………………………………………………..2idvalue500K………………………………………………………………………………………………………………….
Medium texts stored inline
  • toast_test_large: The size of the table is roughly similar to the size of the small table. This is because we inserted large texts into the table, and PostgreSQL stored them out-of-line in the TOAST table. This is why the TOAST table is so big for the large table, but the table itself remained small.
12idvalue500K1x…..1x…..2x…..2x…..500K500Kx…..x…..
Large texts stored out-of-line in TOAST

When we executed our query, the database did a full table scan. To scan the small and large tables, the database only had to read 21MB and 25MB and the query was pretty fast. However, when we executed the query against the medium table, where all the texts are stored inline, the database had to read 977MB from disk, and the query took a lot longer.

TAKE AWAY

TOAST is a great way of keeping tables compact by storing large values out-of-line!

Using the Text Values

In the previous comparison we executed a query that only used the ID, not the text value. What will happen when we actually need to access the text value itself?

db=# timing
Timing is on.

db=# SELECT * FROM toast_test_large WHERE value LIKE 'foo%';
Time: 7509.900 ms (00:07.510)

db=# SELECT * FROM toast_test_large WHERE value LIKE 'foo%';
Time: 7290.925 ms (00:07.291)

db=# SELECT * FROM toast_test_medium WHERE value LIKE 'foo%';
Time: 5869.631 ms (00:05.870)

db=# SELECT * FROM toast_test_medium WHERE value LIKE 'foo%';
Time: 259.970 ms

db=# SELECT * FROM toast_test_small WHERE value LIKE 'foo%';
Time: 78.897 ms

db=# SELECT * FROM toast_test_small WHERE value LIKE 'foo%';
Time: 50.035 ms

We executed a query against all three tables to search for a string within the text value. The query is not expected to return any results, and is forced to scan the entire table. This time, the results are more consistent with what we would expect:

Table Cold cache Warm cache
toast_test_small 78.897 ms 50.035 ms
toast_test_medium 5869.631 ms 259.970 ms
toast_test_large 7509.900 ms 7290.925 ms

The larger the table, the longer it took the query to complete. This makes sense because to satisfy the query, the database was forced to read the texts as well. In the case of the large table, this means accessing the TOAST table as well.

What About Indexes?

Indexes help the database minimize the number of pages it needs to fetch to satisfy a query. For example, let’s take the first example when we searched for a single row by ID, but this time we’ll have an index on the field:

db=# CREATE INDEX toast_test_medium_id_ix ON toast_test_small(id);
CREATE INDEX

db=# CREATE INDEX toast_test_medium_id_ix ON toast_test_medium(id);
CREATE INDEX

db=# CREATE INDEX toast_test_large_id_ix ON toast_test_large(id);
CREATE INDEX

Executing the exact same query as before with indexes on the tables:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.42..8.44 rows=1 width=16)
  Index Cond: (id = 6000)
Time: 0.772 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_medium_id_ix on toast_test_medium(cost=0.42..8.44 rows=1 width=1808
  Index Cond: (id = 6000)
Time: 0.831 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id = 6000;
                                QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_large_id_ix on toast_test_large(cost=0.42..8.44 rows=1 width=22)
  Index Cond: (id = 6000)
Time: 0.618 ms

In all three cases the index was used, and we see that the performance in all three cases is almost identical.

By now, we know that the trouble begins when the database has to do a lot of IO. So next, let’s craft a query that the database will choose to use the index for, but will still have to read a lot of data:

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id BETWEEN 0 AND 250000;
                                QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────
Index Scan using toast_test_small_id_ix on toast_test_small(cost=0.4..9086 rows=249513 width=16
  Index Cond: ((id >= 0) AND (id <= 250000))
Time: 60.766 ms
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_small WHERE id BETWEEN 0 AND 250000;
Time: 59.705 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id BETWEEN 0 AND 250000;
Time: 3198.539 ms (00:03.199)
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_medium WHERE id BETWEEN 0 AND 250000;
Time: 284.339 ms

db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id BETWEEN 0 AND 250000;
Time: 85.747 ms
db=# EXPLAIN (ANALYZE, TIMING) SELECT * FROM toast_test_large WHERE id BETWEEN 0 AND 250000;
Time: 70.364 ms

We executed a query that fetch half the data in the table. This was a low enough portion of table to make PostgreSQL decide to use the index, but still high enough to require lots of IO.

We ran each query twice on each table. In all cases the database used the index to access the table. Keep in mind that the index only helps reduce the number of pages the database has to access, but in this case, the database still had to read half the table.

Table Cold cache Warm cache
toast_test_small 60.766 ms 59.705 ms
toast_test_medium 3198.539 ms 284.339 ms
toast_test_large 85.747 ms 70.364 ms

The results here are similar to the first test we ran. When the database had to read a large portion of the table, the medium table, where the texts are stored inline, was the slowest.

If after reading so far, you are convinced that medium-size texts are what’s causing you performance issues, there are things you can do.

Adjusting toast_tuple_target

toast_tuple_target is a storage parameter that controls the minimum tuple length after which PostgreSQL tries to move long values to TOAST. The default is 2K, but it can be decreased to a minimum of 128 bytes. The lower the target, the more chances are for a medium size string to be move out-of-line to the TOAST table.

To demonstrate, create a table with the default storage params, and another with toast_tuple_target = 128:

db=# CREATE TABLE toast_test_default_threshold (id SERIAL, value TEXT);
CREATE TABLE

db=# CREATE TABLE toast_test_128_threshold (id SERIAL, value TEXT) WITH (toast_tuple_target=128);
CREATE TABLE

db=# SELECT c1.relname, c2.relname AS toast_relname
FROM pg_class c1 JOIN pg_class c2 ON c1.reltoastrelid = c2.oid
WHERE c1.relname LIKE 'toast%threshold' AND c1.relkind = 'r';

           relname            │  toast_relname
──────────────────────────────┼──────────────────
 toast_test_default_threshold │ pg_toast_3250167
 toast_test_128_threshold     │ pg_toast_3250176

Next, generate a value larger than 2KB that compresses to less than 128 bytes, insert to both tables, and check if it was stored out-of-line or not:

db=# INSERT INTO toast_test_default_threshold (value) VALUES (generate_random_string(2100, '123'));
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_3250167;
 chunk_id │ chunk_seq │ chunk_data
──────────┼───────────┼────────────
(0 rows)

db=# INSERT INTO toast_test_128_threshold (value) VALUES (generate_random_string(2100, '123'));
INSERT 0 1

db=# SELECT * FROM pg_toast.pg_toast_3250176;
─[ RECORD 1 ]─────────────
chunk_id   │ 3250185
chunk_seq  │ 0
chunk_data │ x3408.......

The (roughly) similar medium-size text was stored inline with the default params, and out-of-line with a lower toast_tuple_target.

Create a Separate Table

If you have a critical table that stores medium-size text fields, and you notice that most texts are being stored inline and perhaps slowing down queries, you can move the column with the medium text field into its own table:

CREATE TABLE toast_test_value (fk INT, value TEXT);
CREATE TABLE toast_test (id SERIAL, value_id INT)

In my previous article I demonstrated how we use SQL to find anomalies. In one of those use cases, we actually had a table of errors that contained a python traceback. The error messages were medium texts, many of them stored in-line, and as a result the table got big very quickly! So big in fact, that we noticed queries are getting slower and slower. Eventually we moved the errors into a separate table, and things got much faster!


The main problem with medium-size texts is that they make the rows very wide. This is a problem because PostgreSQL, as well as other OLTP oriented databases, are storing values in rows. When we ask the database to execute a query with only a few columns, the values of these columns are most likely spread across many blocks. If the rows are wide, this translates into a lot of IO, which affect the query performance and resource usage.

To overcome this challenge, some non-OLTP oriented databases are using a different type of storage: columnar storage. Using columnar storage, data is stored on disk by columns, not by rows. This way, when the database has to scan a specific column, the values are stored in consecutive blocks, and it usually translated to less IO. Additionally, values of a specific columns are more likely to have repeating patterns and values, so they are better compressed.

2…..idvalue1…..3…..2…..id1…..3…..value
Row vs Column Storage

For non-OLTP payloads such as data warehouse systems, this makes sense. The tables are usually very wide, and queries often use a small subset of the columns, and read a lot of rows. In OLTP payloads, the system will usually read one or very few rows, so storing data in rows makes more sense.

There has been chatter about pluggable storage in PostgreSQL, so this is something to look out for!

Continue Reading

US Election

US Election Remaining

Advertisement

Trending