We're reducing our bandwidth prices.

Engineering

December 12, 2024

Engineering

Fix a top cause of slow queries in PostgreSQL (no slow query log needed)

Eric Fritz

December 12, 2024

Eric Fritz

If you’ve followed our PostgreSQL series, you’ve seen that slowness in your database can sometimes be hard to diagnose. But—some issues are obvious. At Render, we enable millions of developers to effortlessly deploy and scale full-stack applications. Render’s managed PostgreSQL lets you connect any web app to a database in minutes. Given all the developers we support, we notice trends in what developers find challenging. One specific issue with PostgreSQL comes to mind. We might even call it the “top most-fixable cause of slow queries.” Can you guess what it is? Read on, and help reverse the trend.

Spot the problem: a privacy-aware blog

Before we explain the problem, let’s see if you can find it. Let's consider a scenario where we're designing a schema for a blog that respects users’ privacy. In this app, users can:

Create accounts
Write articles
Leave comments on articles

As part of the privacy guarantees, a user can delete any articles they write, and can delete their account. When a user deletes their account, all their articles are also deleted. Their comments remain, but are displayed as written by a “[deleted]” user. To store data for this app, we create the following database schema:

sql

CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT UNIQUE NOT NULL,
    created_at  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

CREATE TABLE articles (
    id          SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    author_id   INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    created_at  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

CREATE TABLE comments (
    id          SERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    author_id   INTEGER REFERENCES users(id) ON DELETE SET NULL,
    article_id  INTEGER NOT NULL REFERENCES articles(id) ON DELETE CASCADE,
    created_at  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

This schema reflects our requirements:

The users table stores information about registered users.
The articles table stores the title and text of an article and a reference to the author. This foreign key is defined with an ON DELETE CASCADE constraint. This constraint deletes all articles associated with a user if that user is deleted.
The comments table stores the text of each comment, as well as references to the author and the article.
- The foreign key to the article has an ON DELETE CASCADE constraint. This constraint deletes all comments associated with an article if that article is deleted.
- The foreign key to the author has an ON DELETE SET NULL constraint. This constraint sets the author to null if the authoring user is deleted.

Where’s the problem?

Hint #1: PostgreSQL auto-creates some indexes

If you’re thinking about missing indexes, you’re on the right track. But which indexes? As a hint, let’s first talk about what’s NOT missing. Whenever you specify a PRIMARY KEY or UNIQUE constraint, PostgreSQL automatically creates a B-Tree index on the constrained values. These B-Trees indexes let PostgreSQL quickly look up whether a value already exists in a given column, which lets PostgreSQL enforce uniqueness. This means in our blog schema, PostgreSQL creates indexes for the primary keys of each table (users.id, articles.id, and comments.id), as well as for the UNIQUE constraint on the user's email (users.email). We can confirm this by running \d+ <table name> in a psql shell:

`users` Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_email_key" UNIQUE CONSTRAINT, btree (email)

`articles` Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)

`comments` Indexes:
    "comments_pkey" PRIMARY KEY, btree (id)

Even though PostgreSQL creates these indexes for its own use, it will use these B-Trees to boost queries you write, too. In other words, if you do equality and range comparisons against any “unique” column, you can be assured that PostgreSQL has created the indexes to make that query efficient.

Hint #2: Let’s run a simulation

To help us better understand this situation, we can simulate it. Let’s set up a demo of our blog scenario and observe its performance. (You can download the code and run this simulation yourself.) First, let’s create test data that includes:

1,000 users
About 25,000 articles
About 13 million comments, randomly distributed among articles

Now we’ll delete a user by its primary key. Let’s examine the query plan for this deletion:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Delete on users  (cost=0.28..8.29 rows=0 width=0) (actual time=0.190..0.191 rows=0 loops=1)
   ->  Index Scan using users_pkey on users  (cost=0.28..8.29 rows=1 width=6) (actual time=0.110..0.113 rows=1 loops=1)
         Index Cond: (id = 743)
 Planning Time: 0.198 ms
 Trigger for constraint articles_author_id_fkey on users: time=17.680 calls=1
 Trigger for constraint comments_author_id_fkey on users: time=777.397 calls=1
 Trigger for constraint comments_article_id_fkey on articles: time=18245.081 calls=28
 Execution Time: 19040.423 ms

The entire query takes a staggering 19 seconds to execute! Why? At first glance, this query plan looks optimal. It uses an index scan to find the user by its primary key, which is very efficient and takes only 0.1 milliseconds. But this query takes a lot of time to process triggers for the foreign key constraints:

Trigger Name	Execution Time
`articles_author_id_fkey`	17.7 milliseconds
`comments_author_id_fkey`	777 milliseconds
`comments_article_id_fkey`	18.2 seconds

Remember we specified ON DELETE CASCADE constraints to fulfill our privacy guarantees? Triggers carry out the downstream deletions tied to a cascading constraint. In this simulation, the triggers delete rows in the articles and comments tables that are tied to the deleted user. The large amount of time taken by these triggers indicates that PostgreSQL is likely performing full table scans to find the related rows. The bigger a table, the longer it’ll take to scan, so it makes sense that our comments table takes the longest.

Answer: The missing index on foreign keys

Based on the simulation, you might have guessed the answer. The answer is that we’re missing indexes on our foreign keys—on one side of the relationship. As we saw earlier, PostgreSQL automatically creates an index on the primary keys for each table (users.id, articles.id, and comments.id). In general, PostgreSQL also automatically creates an index on one side of every foreign key relationship: on the column in the referenced table. But PostgreSQL does NOT automatically create an index on the other side of the foreign key relationship: on the column in the referencing table. For example, author_id is referenced as a foreign key in the articles table:

CREATE TABLE articles (
    id          SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    author_id   INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    created_at  TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

As a result of this schema declaration:

If the id column in the users table (the referenced table) did not already have an index, PostgreSQL would automatically create that index.
Within the articles table (the referencing table), PostgreSQL will NOT automatically create an index on author_id—though it seems like a sensible thing PostgreSQL could do. This missing index is what causes a performance problem.

When does the performance problem happen?

Armed with this knowledge, we can now reason about all the situations where you may hit slow queries if you forget to add these foreign key indexes.

Situation 1: Common queries

In our blog application example, let's consider a fairly common operation: fetching all articles written by a particular user. Given a user, we want to find all articles associated with that user through the author_id foreign key relationship. Consider the following query:

sql

SELECT * FROM articles WHERE author_id = 123;

Without an index on the author_id field in the articles table, this query would need to look at each row of the articles table, and check each row's author_id value against the desired user ID value. This operation becomes increasingly slow as the number of articles grows.

Situation 2: Cascading deletes

What if we never need to make that particular query in our application? In that case, it may seem savvy to skip creating the index on the foreign key. After all, maintaining an index you never use is pure overhead: it costs memory and disk to store, and it costs CPU cycles when inserting, updating, and deleting rows in the table. Well, not so fast. As we saw in our simulation, you must take care to create these indexes to support cascading delete operations. In our scenario, when we delete a user, PostgreSQL needs to find all of the articles written by the user to propagate the deletion. Similarly, PostgreSQL also needs to find all of the comments on articles written by the author.

Fixing the simulation

To apply what we learned, let’s fix the slow query in our simulation. As a reminder, we simply deleted a user. This was the resulting (very slow) query plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Delete on users  (cost=0.28..8.29 rows=0 width=0) (actual time=0.190..0.191 rows=0 loops=1)
   ->  Index Scan using users_pkey on users  (cost=0.28..8.29 rows=1 width=6) (actual time=0.110..0.113 rows=1 loops=1)
         Index Cond: (id = 743)
 Planning Time: 0.198 ms
 Trigger for constraint articles_author_id_fkey on users: time=17.680 calls=1
 Trigger for constraint comments_author_id_fkey on users: time=777.397 calls=1
 Trigger for constraint comments_article_id_fkey on articles: time=18245.081 calls=28
 Execution Time: 19040.423 ms

To speed up this query, let’s create indexes on the foreign key relationships:

sql

CREATE INDEX articles_author_id ON articles(author_id);
CREATE INDEX comments_author_id ON comments(author_id);
CREATE INDEX comments_article_id ON comments(article_id);

Note that we don't want to make these indexes unique, because they represent many-to-one relationships. A single user can author multiple articles, and a single article can have multiple comments. These indexes will allow PostgreSQL to efficiently:

Find all articles by a specific author (when querying articles, or when deleting a user).
Find all comments by a specific author (when querying comments, or when deleting a user).
Find all comments for a specific article (when querying comments, or when deleting an article).

With these new indexes in place, let's delete another user and inspect the query plan:

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Delete on users  (cost=0.28..8.29 rows=0 width=0) (actual time=0.142..0.143 rows=0 loops=1)
   ->  Index Scan using users_pkey on users  (cost=0.28..8.29 rows=1 width=6) (actual time=0.108..0.111 rows=1 loops=1)
         Index Cond: (id = 276)
 Planning Time: 0.183 ms
 Trigger for constraint articles_author_id_fkey on users: time=0.327 calls=1
 Trigger for constraint comments_author_id_fkey on users: time=315.721 calls=1
 Trigger for constraint comments_article_id_fkey on articles: time=26.136 calls=48
 Execution Time: 342.376 ms

The performance improvement is dramatic. The total execution time has dropped from 19 seconds to just 342 milliseconds—a 55x improvement. In the updated query plan, the repeated table scans are replaced with repeated index scans, which efficiently jump to the range of relevant rows to delete. Let's break down the improvements:

Trigger Name	Before Execution Time	After Execution Time
`articles_author_id_fkey`	17.7ms	0.327ms
`comments_author_id_fkey`	777ms	315ms
`comments_article_id_fkey`	18s	26ms

This last improvement is particularly noteworthy. We see a 99.9% reduction in runtime, even though this particular delete operation deletes significantly more records than our previous deletion. (The comments_article_id_fkey constraint trigger is invoked 48 times in this deletion, compared to 28 times in the previous deletion.)

Try it yourself

Proper indexing is critical if you want good database performance. As a rule of thumb, consider creating indexes on foreign key columns in referencing tables. Keep in mind that:

Even if you don’t query the relationship now, there’s a strong possibility you (or a teammate) might add such a query in the future.
If you add an ON DELETE propagation constraint to a foreign key relationship, PostgreSQL will query the relationship on delete. In this case, you absolutely need the index.

Next, we encourage you to:

Try running the simulation yourself. We’ve created a GitHub repo with scripts that make it easy to create the schema, generate demo data, and create the missing indexes.
Check out the other posts in our PostgreSQL series!