PostgreSQL 14, released on September 30, 2021, introduces many new features and improvements. Two I find particularly interesting are JSON syntax conveniences and the multirange data type. But there are also several noticeable performance improvements that many applications will benefit from without any code changes.
JSON Conveniences
PostgreSQL can store and manipulate JSON data. There are JSON and JSONB data types. They are similar, but most of the time you probably want to use JSONB. The JSON data type is stored as a string whereas JSONB is stored in a decomposed binary format that allows PostgreSQL to query and manipulate nested structures inside the JSON more efficiently. JSONB also allows you to create indexes based on nested values. I'll be using JSONB exclusively here. Let's create a table and put some JSON data in it. Here's the table we just created. Cool. But what if we want to query nested values within the JSON? Here's how you used to have to do it with PostgreSQL 13. The#>
is an operator that tells PostgreSQL to return a JSON sub-object. PostgreSQL has a lot of other operators for working with JSON like #>>
, ->
, and ->>
.
The {a,b,1}
is a path into the JSON object that says
- Grab the value of property
a
:{"b": ["foo", "hello"]}
- Within that, give me the value of property
b
:["foo", "hello"]
- Within that array, give me the 2nd element:
"hello"
WHERE
clause.
And we can update values within the JSON object using the same syntax.
This new syntax closely matches how you might access nested values in a JSON object with JavaScript.
Multirange Data Type
PostgreSQL has had range data types since version 9.2. Now there are multirange data types. A range data type is useful when you want a single value to represent a range of dates, time, or numbers. For example, you might use a range data type to represent meeting schedules or radio spectrum allocation. With a range data type, you only need a single column to store what would otherwise require two columns--start of range and end of range. Prior to PostgreSQL 14, that range could only be a single contiguous range. Now with multiranges, multiple discontiguous (but not overlapping) ranges can be stored as a single value. Let's look at a meeting scheduling example derived from the PostgreSQL documentation. This row represents a room reservation for a meeting from 14:30 to 15:30 on January 1, 2021. But what if instead of a one hour meeting, we were planning an all-day meeting with a lunch break in the middle? What would a row for that reservation look like? Prior to PostgreSQL 14's multirange support, it would have to be stored as two rows. But then we'd need to add ameeting_id
column so that we know these two rows represent the same meeting.
With multiranges we only need one row to represent this meeting.
Improved Handling of Concurrent Connections
A lot has happened behind the scenes to improve PostgreSQL performance. However, PostgreSQL 14's handling of multiple client connections will likely be the performance improvement noticeable to the most people. The folks at pganalyze did some great performance testing usingpgbench
to verify this. At 5000 active connections, they saw about a 20% improvement in throughput over PostgreSQL 13. At 10,000 connections, that improvement went up to 50%. Of course, these results are dependent on your hardware, schema, data, and queries, so you may not see the same results, but the consensus seems to be that PostgreSQL's handling of concurrent connections has improved significantly.