I bought Mastering Postgres because I use Postgres a lot and wanted to support Aaron and his content. I already learned more than I expected and have been very impressed with the quantity and work Aaron put in.Timo Strackfeldt
Shorten dev cycles with branching and zero-downtime schema migrations.
Now we're going to look at the second way to index JSON and Postgres and that is with a GIN index. Now, the two methods that we looked at before, which were functional index and generated column, those both use the B-tree index, which is great for a lot of things. 98% of the time you need an index, it's a B-tree index. It does strict equality, it does ranges, it can do partial matches, that sort of thing.
A GIN index is fundamentally different than a B-tree. It is an inverted index and it's much better at containment and key existence, that sort of thing. Now, GIN indexes are not free. They are more expensive than a B-tree index. They can be both bigger in size and take longer to update because they are quite a bit more complex. If your JSON blobs are huge and frequently updated, you might run into a little bit of a problem trying to continually update these GIN indexes. Keep that in mind. I think for most reasonable sized blobs that are updated at a reasonable frequency, you don't really have to worry about it. If you're hammering huge JSON documents, you might get in trouble.
Let's take a look at some GIN indexes. To create this index, we're just going to say create index, and I've copied the name here, idx_orders_json_details. You could swap that around if you want, order_josn_details_idx. I don't super care. On orders_json. Then we need to say using gin. The default here is using B-tree, but we're going to say using gin over the details column. With that index in place, let's do select * from orders_json where details contains. Then we're going to open this up and we're going to say "status": shipped. With any luck, if we hit explain on that, you'll see the index condition. This one that we created up here, orders_json_details_idx. That index is being used for that containment operator. Now, it's not used for every operator, but for the containment operator and for the key existence operators, it is going to be used. I'll show you key existence here in a second. There is an option that you can pass to the index creation. You can say, "Hey, I actually don't care about the values. I just want to check some paths." And that will make your GIN index a little bit smaller, and we'll look at that in a second.
Let's move on and check some key existence. Let's just check for the existence of the customer key. You see we're still using that order_json_details. You can also do the any existence. You can say does it contain any of these keys, customer or foo? Still using that index and does it contain customer and foo? You'll see we're still using that index as well. We have the containment operator, the simple key existence, the key any and the key all operators. The index will be used for all of that.
Now, let me show you that one other thing that you can pass through to the GIN creation index statement. By default, when you create this, you're actually passing jsonb_ops. That's the default parameter there. You can pass jsonb_path_ops. This is a little bit... It's a little bit more constricted. It gives you less flexibility, but it does give you higher performance. It only works with a certain subset of operators. Let's go ahead and drop index, that guy. We'll drop, that guy doesn't exist. Must have dropped it already. We'll create it back. Then if we explain this, you'll see we're just doing a scan on the table. The path_ops does not work with these key existence operators. It does, however, work with these containment operators. We can say status and shipped, and that will work with our new path_ops parameter for the GIN index. While this path_ops is a little less flexible, it might be exactly what you're looking for. It makes smaller indexes. It is more performant, but it only works with that containment operator. It works with the path existence operator and the path evaluation operator for evaluating Boolean conditions.
I'll leave those three operators down in the description below and a link to the Postgres docs where you can see that. You have to consider what types of queries are you going to be doing. You have to consider that always when creating an index. Just as an overview, a recap of indexing JSON. If you're going for a single, you're indexing a single value, a nested key or a top level key or something, you're indexing a single thing, your best bet is to use a B-tree index. The way that you can get there is a generated column or a functional index. That's going to be the smallest, the fastest, the most performant. That is what you want when you need something that is traditional.
When you need to index the entire JSON blob, you're going to need to use a GIN index. You have two options. You have jsonb_ops and jsonb_path_ops. Depending on what your queries look like, depends on which one you want to choose. Path_ops gives you less flexibility, but tighter indexes that are more performant and still work great with the containment operator. So maybe that would be the one that you would go for unless you specifically need some sort of key existence operators, in which case, you'll have to go with the default, which is jsonb_ops. You have a lot of options. JSON and Postgres is super powerful. You should wield it accordingly.
Now that you know exactly what you're doing, you can figure out the best structure for you. I don't think there's anything bad here. You just have to decide what are my needs, what are my use cases? Then pick of the many available options, which ones satisfy those requirements.