Aaron is a natural teacher and this course is the best introduction to Postgres I have come across. Lessons are easy to follow and he recommends some great tools for working with Postgres.Joel Drumgoole
Shorten dev cycles with branching and zero-downtime schema migrations.
So far, everything we've been doing with indexing is putting an index across one column or multiple columns. And you may be thinking, "What else is left? An index on zero columns?" And yeah, kind of, honestly.
What we're gonna look at here is putting an index on the result of a function or an expression. This is very similar if you think back to generated columns, this is very similar to generated columns, but instead of storing that value in a column, we're gonna store that value in a B-tree, i.e. an index.
We can take the result of a function, take that value and write it into the B-tree, such that when the time comes to look up a value, we don't have to do all that computation, we just look for it in the B-tree. Then every time the value is, or the row is inserted, updated or deleted, that function is evaluated and that value is sent off to the B-tree. We're gonna look at the same example that we looked at in generated columns, and that is extracting the domain off of an email address.
In case you've forgotten, we've got select * from users limit 10. And we do have an email column there. Instead of that, we are going to use this split part function, and we're gonna split the email column using the @ sign as the delimiter. We're gonna take item number two, which should give us, there you go, that gives us the domain. Now, in a previous video, we did use this as a function to populate a generated column, but we already did that, so let's do something else. Create index domain on users. And then you need to open another set of parentheses. If you don't open that other set of parentheses, that indicates that you want a column or multiple columns. But if you do open another set of parentheses, then you can paste your function in there and that will index the result of that function.
If we run that, we're going to get an index on domain, but we don't see domain, of course, because it's not a column. If we do select * and limit 10 again, you'll see nothing new shows up. However, if we drop down here and we say where, and we use that exact same function, we say where split part email equals, and let's just pick one, let's pick beer.com. Great domain, probably very expensive. If we were to do that, you'll see all the beer.coms come back pretty quickly, but pretty quickly is not an indicator that it actually worked. Let's see if it actually worked. There you go. It did an index scan on domain, or rather the index names domain on users, which is this guy that we created right here.
Now, of course, if we change, it doesn't actually work because this is not the function that the index was declared with. You have to be quite careful that you're using the actual function that the index was declared with, because that's how Postgres is going to link those two things up together. Hopefully your mind is already spinning about where you could be using something like this. I think one thing that is super nice and often overlooked about this is you don't have to change your application side code at all to start taking advantage of this. Maybe you have some system that you're not fully in control of, some library, some package, some tool, some external thing that connects to your database, and you don't really have control over the queries, but maybe they're issuing some sort of query like that around a function and you can just add an index on your side to match their function expression. And then suddenly everything gets a lot faster and you didn't have to touch the code at all, which is kind of nice.
I think a lot of people's minds go directly to indexing JSON parts. That's where my mind goes for sure. You can definitely extract some keys out of a big blob, some keys that you're commonly searching for. You can grab those, pull those out into a functional index on their own, and that becomes super quick. We're gonna look at indexing JSON separately in a different video, but I want to show you one more, which is case insensitive indexing. Instead of creating an index on the split part email, we're gonna create an index on the lower email here. And we're just gonna call this email_lower. If we run that, and then we come down here and we change this to, we'll have to find somebody's email. Oh, I'm definitely in there. We could change this to aaron.francis@example.com, and you'll see this is not going to use that index at all. Why? Because the index doesn't match.
We declared this index as a function and we're just using this unwrapped or this bare column out here. We could do it like that. And you'll see we do get a scan on that index. Now, the problem is, if the user types it in like that, we're not gonna, I mean, it still works, but we're not gonna get any results back.
See, because we're comparing something that is not fully lowercase to something that is definitely fully lowercase. In this instance, if you have a lower that you're searching for, you want to make sure that you lower all the user input as well. And you see Cleo Simonis is still in there as the deleted user, and then Aaron Francis as the real user. And if we switch back, you'll see we, of course, are still using that email lower index.
Functional indexes will take you from DBA level 101 to at least level 201, if not 301. Knowing how to use these functional indexes to index what may otherwise be pretty tricky queries can be one of your superpowers.