If you’ve followed our PostgreSQL series, you’ve seen that slowness in your database can sometimes be hard to diagnose.
But—some issues are obvious.
At Render, we enable millions of developers to effortlessly deploy and scale full-stack applications. Render’s managed PostgreSQL lets you connect any web app to a database in minutes.
Given all the developers we support, we notice trends in what developers find challenging. One specific issue with PostgreSQL comes to mind. We might even call it the “top most-fixable cause of slow queries.”
Can you guess what it is? Read on, and help reverse the trend.
This schema reflects our requirements:
Even though PostgreSQL creates these indexes for its own use, it will use these B-Trees to boost queries you write, too. In other words, if you do equality and range comparisons against any “unique” column, you can be assured that PostgreSQL has created the indexes to make that query efficient.
The entire query takes a staggering 19 seconds to execute!
Why? At first glance, this query plan looks optimal. It uses an index scan to find the user by its primary key, which is very efficient and takes only 0.1 milliseconds.
But this query takes a lot of time to process triggers for the foreign key constraints:
Remember we specified
As a result of this schema declaration:
Without an index on the
To speed up this query, let’s create indexes on the foreign key relationships:
Note that we don't want to make these indexes unique, because they represent many-to-one relationships. A single user can author multiple articles, and a single article can have multiple comments.
These indexes will allow PostgreSQL to efficiently:
The performance improvement is dramatic. The total execution time has dropped from 19 seconds to just 342 milliseconds—a 55x improvement.
In the updated query plan, the repeated table scans are replaced with repeated index scans, which efficiently jump to the range of relevant rows to delete.
Let's break down the improvements:
This last improvement is particularly noteworthy. We see a 99.9% reduction in runtime, even though this particular delete operation deletes significantly more records than our previous deletion. (The
Spot the problem: a privacy-aware blog
Before we explain the problem, let’s see if you can find it. Let's consider a scenario where we're designing a schema for a blog that respects users’ privacy. In this app, users can:- Create accounts
- Write articles
- Leave comments on articles
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
author_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
author_id INTEGER REFERENCES users(id) ON DELETE SET NULL,
article_id INTEGER NOT NULL REFERENCES articles(id) ON DELETE CASCADE,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
- The
users
table stores information about registered users. - The
articles
table stores the title and text of an article and a reference to the author. This foreign key is defined with anON DELETE CASCADE
constraint. This constraint deletes all articles associated with a user if that user is deleted. - The
comments
table stores the text of each comment, as well as references to the author and the article.- The foreign key to the article has an
ON DELETE CASCADE
constraint. This constraint deletes all comments associated with an article if that article is deleted. - The foreign key to the author has an
ON DELETE SET NULL
constraint. This constraint sets the author to null if the authoring user is deleted.
- The foreign key to the article has an
Hint #1: PostgreSQL auto-creates some indexes
If you’re thinking about missing indexes, you’re on the right track. But which indexes? As a hint, let’s first talk about what’s NOT missing. Whenever you specify aPRIMARY KEY
or UNIQUE
constraint, PostgreSQL automatically creates a B-Tree index on the constrained values. These B-Trees indexes let PostgreSQL quickly look up whether a value already exists in a given column, which lets PostgreSQL enforce uniqueness.
This means in our blog schema, PostgreSQL creates indexes for the primary keys of each table (users.id
, articles.id
, and comments.id
), as well as for the UNIQUE constraint on the user's email (users.email
).
We can confirm this by running \d+ <table name>
in a psql
shell:
`users` Indexes:
"users_pkey" PRIMARY KEY, btree (id)
"users_email_key" UNIQUE CONSTRAINT, btree (email)
`articles` Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
`comments` Indexes:
"comments_pkey" PRIMARY KEY, btree (id)
Hint #2: Let’s run a simulation
To help us better understand this situation, we can simulate it. Let’s set up a demo of our blog scenario and observe its performance. (You can download the code and run this simulation yourself.) First, let’s create test data that includes:- 1,000 users
- About 25,000 articles
- About 13 million comments, randomly distributed among articles
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Delete on users (cost=0.28..8.29 rows=0 width=0) (actual time=0.190..0.191 rows=0 loops=1)
-> Index Scan using users_pkey on users (cost=0.28..8.29 rows=1 width=6) (actual time=0.110..0.113 rows=1 loops=1)
Index Cond: (id = 743)
Planning Time: 0.198 ms
Trigger for constraint articles_author_id_fkey on users: time=17.680 calls=1
Trigger for constraint comments_author_id_fkey on users: time=777.397 calls=1
Trigger for constraint comments_article_id_fkey on articles: time=18245.081 calls=28
Execution Time: 19040.423 ms
articles_author_id_fkey | 17.7 milliseconds |
---|---|
comments_author_id_fkey | 777 milliseconds |
comments_article_id_fkey | 18.2 seconds |
ON DELETE CASCADE
constraints to fulfill our privacy guarantees? Triggers carry out the downstream deletions tied to a cascading constraint. In this simulation, the triggers delete rows in the articles
and comments
tables that are tied to the deleted user.
The large amount of time taken by these triggers indicates that PostgreSQL is likely performing full table scans to find the related rows. The bigger a table, the longer it’ll take to scan, so it makes sense that our comments table takes the longest.
Answer: The missing index on foreign keys
Based on the simulation, you might have guessed the answer. The answer is that we’re missing indexes on our foreign keys—on one side of the relationship. As we saw earlier, PostgreSQL automatically creates an index on the primary keys for each table (users.id
, articles.id
, and comments.id
). In general, PostgreSQL also automatically creates an index on one side of every foreign key relationship: on the column in the referenced table. But PostgreSQL does NOT automatically create an index on the other side of the foreign key relationship: on the column in the referencing table.
For example, author_id
is referenced as a foreign key in the articles
table:
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
author_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
- If the
id
column in theusers
table (the referenced table) did not already have an index, PostgreSQL would automatically create that index. - Within the
articles
table (the referencing table), PostgreSQL will NOT automatically create an index onauthor_id
—though it seems like a sensible thing PostgreSQL could do. This missing index is what causes a performance problem.
When does the performance problem happen?
Armed with this knowledge, we can now reason about all the situations where you may hit slow queries if you forget to add these foreign key indexes.Situation 1: Common queries
In our blog application example, let's consider a fairly common operation: fetching all articles written by a particular user. Given a user, we want to find all articles associated with that user through theauthor_id
foreign key relationship.
Consider the following query:
SELECT * FROM articles WHERE author_id = 123;
author_id
field in the articles
table, this query would need to look at each row of the articles
table, and check each row's author_id
value against the desired user ID value. This operation becomes increasingly slow as the number of articles grows.
Situation 2: Cascading deletes
What if we never need to make that particular query in our application? In that case, it may seem savvy to skip creating the index on the foreign key. After all, maintaining an index you never use is pure overhead: it costs memory and disk to store, and it costs CPU cycles when inserting, updating, and deleting rows in the table. Well, not so fast. As we saw in our simulation, you must take care to create these indexes to support cascading delete operations. In our scenario, when we delete a user, PostgreSQL needs to find all of the articles written by the user to propagate the deletion. Similarly, PostgreSQL also needs to find all of the comments on articles written by the author.Fixing the simulation
To apply what we learned, let’s fix the slow query in our simulation. As a reminder, we simply deleted a user. This was the resulting (very slow) query plan: QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Delete on users (cost=0.28..8.29 rows=0 width=0) (actual time=0.190..0.191 rows=0 loops=1)
-> Index Scan using users_pkey on users (cost=0.28..8.29 rows=1 width=6) (actual time=0.110..0.113 rows=1 loops=1)
Index Cond: (id = 743)
Planning Time: 0.198 ms
Trigger for constraint articles_author_id_fkey on users: time=17.680 calls=1
Trigger for constraint comments_author_id_fkey on users: time=777.397 calls=1
Trigger for constraint comments_article_id_fkey on articles: time=18245.081 calls=28
Execution Time: 19040.423 ms
CREATE INDEX articles_author_id ON articles(author_id);
CREATE INDEX comments_author_id ON comments(author_id);
CREATE INDEX comments_article_id ON comments(article_id);
- Find all articles by a specific author (when querying articles, or when deleting a user).
- Find all comments by a specific author (when querying comments, or when deleting a user).
- Find all comments for a specific article (when querying comments, or when deleting an article).
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Delete on users (cost=0.28..8.29 rows=0 width=0) (actual time=0.142..0.143 rows=0 loops=1)
-> Index Scan using users_pkey on users (cost=0.28..8.29 rows=1 width=6) (actual time=0.108..0.111 rows=1 loops=1)
Index Cond: (id = 276)
Planning Time: 0.183 ms
Trigger for constraint articles_author_id_fkey on users: time=0.327 calls=1
Trigger for constraint comments_author_id_fkey on users: time=315.721 calls=1
Trigger for constraint comments_article_id_fkey on articles: time=26.136 calls=48
Execution Time: 342.376 ms
articles_author_id_fkey | 17.7ms | 0.327ms |
---|---|---|
comments_author_id_fkey | 777ms | 315ms |
comments_article_id_fkey | 18s | 26ms |
comments_article_id_fkey
constraint trigger is invoked 48 times in this deletion, compared to 28 times in the previous deletion.)
Try it yourself
Proper indexing is critical if you want good database performance. As a rule of thumb, consider creating indexes on foreign key columns in referencing tables. Keep in mind that:- Even if you don’t query the relationship now, there’s a strong possibility you (or a teammate) might add such a query in the future.
- If you add an
ON DELETE
propagation constraint to a foreign key relationship, PostgreSQL will query the relationship on delete. In this case, you absolutely need the index.
- Try running the simulation yourself. We’ve created a GitHub repo with scripts that make it easy to create the schema, generate demo data, and create the missing indexes.
- Check out the other posts in our PostgreSQL series!