Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced JSON
GIN index

Full Course

$
349
$399
USD, one-time fee
Really enjoying Aaron Francis' "Mastering Postgres". It's information-dense, very approachable, and well-produced.
Daniel Bachhuber
Daniel Bachhuber

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn how to use GIN indexes in Postgres to effectively index JSON data, offering better performance than B-tree indexes for operations like containment and key existence. The video explores the differences between `jsonb_ops` and `jsonb_path_ops` for GIN indexes, highlighting that while `jsonb_path_ops` offers higher performance and smaller index sizes, it supports fewer operations. We also discuss how to choose the best indexing strategy by considering the types of queries you’ll run, balancing flexibility with performance.

Links

JSON operator documentation
JSON GIN operators documentation

Video Transcript

Now we're going to look at the second way to index JSON and Postgres and that is with a GIN index. Now, the two methods that we looked at before, which were functional index and generated column, those both use the B-tree index, which is great for a lot of things. 98% of the time you need an index, it's a B-tree index. It does strict equality, it does ranges, it can do partial matches, that sort of thing.

A GIN index is fundamentally different than a B-tree. It is an inverted index and it's much better at containment and key existence, that sort of thing. Now, GIN indexes are not free. They are more expensive than a B-tree index. They can be both bigger in size and take longer to update because they are quite a bit more complex. If your JSON blobs are huge and frequently updated, you might run into a little bit of a problem trying to continually update these GIN indexes. Keep that in mind. I think for most reasonable sized blobs that are updated at a reasonable frequency, you don't really have to worry about it. If you're hammering huge JSON documents, you might get in trouble.

Let's take a look at some GIN indexes. To create this index, we're just going to say create index, and I've copied the name here, idx_orders_json_details. You could swap that around if you want, order_josn_details_idx. I don't super care. On orders_json. Then we need to say using gin. The default here is using B-tree, but we're going to say using gin over the details column. With that index in place, let's do select * from orders_json where details contains. Then we're going to open this up and we're going to say "status": shipped. With any luck, if we hit explain on that, you'll see the index condition. This one that we created up here, orders_json_details_idx. That index is being used for that containment operator. Now, it's not used for every operator, but for the containment operator and for the key existence operators, it is going to be used. I'll show you key existence here in a second. There is an option that you can pass to the index creation. You can say, "Hey, I actually don't care about the values. I just want to check some paths." And that will make your GIN index a little bit smaller, and we'll look at that in a second.

Let's move on and check some key existence. Let's just check for the existence of the customer key. You see we're still using that order_json_details. You can also do the any existence. You can say does it contain any of these keys, customer or foo? Still using that index and does it contain customer and foo? You'll see we're still using that index as well. We have the containment operator, the simple key existence, the key any and the key all operators. The index will be used for all of that.

Now, let me show you that one other thing that you can pass through to the GIN creation index statement. By default, when you create this, you're actually passing jsonb_ops. That's the default parameter there. You can pass jsonb_path_ops. This is a little bit... It's a little bit more constricted. It gives you less flexibility, but it does give you higher performance. It only works with a certain subset of operators. Let's go ahead and drop index, that guy. We'll drop, that guy doesn't exist. Must have dropped it already. We'll create it back. Then if we explain this, you'll see we're just doing a scan on the table. The path_ops does not work with these key existence operators. It does, however, work with these containment operators. We can say status and shipped, and that will work with our new path_ops parameter for the GIN index. While this path_ops is a little less flexible, it might be exactly what you're looking for. It makes smaller indexes. It is more performant, but it only works with that containment operator. It works with the path existence operator and the path evaluation operator for evaluating Boolean conditions.

I'll leave those three operators down in the description below and a link to the Postgres docs where you can see that. You have to consider what types of queries are you going to be doing. You have to consider that always when creating an index. Just as an overview, a recap of indexing JSON. If you're going for a single, you're indexing a single value, a nested key or a top level key or something, you're indexing a single thing, your best bet is to use a B-tree index. The way that you can get there is a generated column or a functional index. That's going to be the smallest, the fastest, the most performant. That is what you want when you need something that is traditional.

When you need to index the entire JSON blob, you're going to need to use a GIN index. You have two options. You have jsonb_ops and jsonb_path_ops. Depending on what your queries look like, depends on which one you want to choose. Path_ops gives you less flexibility, but tighter indexes that are more performant and still work great with the containment operator. So maybe that would be the one that you would go for unless you specifically need some sort of key existence operators, in which case, you'll have to go with the default, which is jsonb_ops. You have a lot of options. JSON and Postgres is super powerful. You should wield it accordingly.

Now that you know exactly what you're doing, you can figure out the best structure for you. I don't think there's anything bad here. You just have to decide what are my needs, what are my use cases? Then pick of the many available options, which ones satisfy those requirements.