The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!Nakolus
Shorten dev cycles with branching and zero-downtime schema migrations.
Creating an index on a single column is very good and something we now know how to do, but you're gonna get more bang for your buck, more power if you know how to create a composite index, which is an index over many columns at the same time.
Instead of creating potentially three discrete indexes across three columns, you could create one index across all three columns and that's gonna give you a lot more performance. Postgres does have the ability to scan two separate indexes and then kind of combine their results in an intelligent way. If you don't have a composite index that fits perfectly for a query, Postgres is gonna do its best to try to help you out, which is awesome. Not every database has that capability. Again, good job Postgres on that one, but we're still gonna get better performance, especially when it comes to sorting if we do have that multi column or composite index across many columns.
I'm going to give you a few rules that we will continue to drill into your brain as time goes on. If you don't remember them right now, that's fine, but we need to talk about this idea of a leftmost prefix. The two rules that I'm going to give you are left to right, no skipping and stops at the first range. Left to right, no skipping and stops at the first range. We're gonna continue to look at those rules as we go. First I want to show you a few examples.
Let's first create that composite index. I'm gonna say create index multi on users using Btree across. Let's do it across first name, last name and birthday. This is gonna take a few seconds, but then we can start to write a query that will utilize this new index. There is a little bit of a caveat with Postgres in terms of the left most prefix rule, but we'll discuss that in a second. If we do select * from users where last name equals Francis, we've just put an index across first name, last name and birthday. Let's see if that index is being used here. In fact, it is not. It's scanning the entire table and it's filtering out rows or filtering in rows where the last name equals Francis. SIt's not using our index whatsoever. That is because we declared our index as first name and then last name and then birthday. When you declare this index, the order matters so, so much. When you are writing a query, the order doesn't matter.
If we were to come down here and we were to say, let's say first name, first name equals Aaron and last name equals Francis, it does not matter if you put first name first and then last name. It doesn't matter which order you write these in. When you're writing your actual query, the order of your conditions does not matter whatsoever. When you hear me say the order is super important, I don't mean the order of your query, I mean the order in which you declared your index because like I said at the beginning, you have to go left to right, no skipping and the technical name of that rule is a leftmost prefix. Most Btree indexes follow a leftmost prefix rule. Postgres does have the ability to kind of skip, but that's a nuance that we'll talk about in a second.
What you need to know is you have to start with the left most column. In this case where we started with last name, we did not start with the left most column. We started with a middle column and we completely skipped over first name and Postgres says, I can't work with that, I don't know what to do with that. Instead, if we were to switch this to first name equals Aaron and then we explain on that, you'll see the index condition first name equals Aaron. That time it did work because we formed a leftmost prefix. You have to go left to right in terms of the order that you declared your index in. If we keep going and we don't need this much space, so if we keep going and we start to add something, let's add a constraint on last name as well. Let's drop this down here and we can say, and last name equals Francis. You'll see that this gets a little bit bigger. The index condition is first name equals Aaron, and last name equals Francis. We're moving left to right without skipping any columns in between.
However, what if we just change this to birthday? 1989 02 14. What about that rule? I told you left to right no skipping, we just skipped last name. Postgres says, cool, that's fine, I don't care. This is a little bit confusing, I will admit. It goes against the rule that I have told you and that you've probably heard elsewhere, left to right, no skipping. We skipped last name and it still seemed to work just fine. What is happening here is a little bit of a Postgres optimization. Normally if you just had an index, if you just had an index over first name and birthday, it could just directly navigate to people who are named Aaron with a birthday on 1989 02 14. That's what would happen if you didn't have that last name column in the middle. Because we have that last name column in the middle, what Postgres is doing is saying, all right, here's the deal y'all, I can use the first name, I can use the first name.
What I'm gonna do is I'm gonna navigate through that btree to the chunk of leaf nodes that have the first name Aaron. I can get there just by using the index, however, because last name is in the middle mucking everything up, then I'm just gonna scan all the way through the index to look for people that have the birthday of 1989 02 14. Instead of just directly traversing to the exact correct rows, it's doing a traversal down to the errands, which narrows it down quite a bit and then it scans through the index looking for 1989 02 14.
However, if we change this to and last name equals Francis, then it's just using the btree traversal and it doesn't have to scan all of those leaf nodes at the bottom. That's what we're optimizing for. Frankly, we don't want a big index scan at the bottom. The Postgres stocks will tell you that if you form your leftmost prefix, it limits the amount of the index that must be scanned. What we wanna do is we wanna do that direct traversal and just find immediately the rows that we're looking for based on the btree. If you don't form a leftmost prefix, it'll do its best to traverse down to, in this case, just people with the name, first name of Aaron. Then it has to scan all of those nodes in the index. Because the data does exist in the index. However, the tree structure is not set up to jump over last name. It finds all the errands and then it gets to work churning through all of them looking for people with the right birthday. That is better than scanning the entire table.
In order of efficiency, we want direct btree traversal, that's awesome, index scan, that's okay. Table scan, that's a lot worse. So form a leftmost prefix. Let me see if I can prove to you that there's a better option here. One way that I think we can prove that this is not the best option is if we do create index multi two on users using Btree. We just put one over first name and birthday and now we're gonna see which one Postgres picks when we drop out the last name. I think I have a pretty good idea. If we run that, it says multi two on users because that is much more efficient than the original multi, which had that last name lingering in the middle there.
What we've learned from all of this, one, is that Btrees are awesome. Two, that Postgres is awesome because it helps you out. Three, and maybe most importantly for you, your most common conditions need to go on the left side of your index.
If you have many, many queries that query against first name, but only some that query against birthday, well that's a good indication that first name should be moved towards the front of an index and birthday should be moved towards the back of an index. You also have to consider strict equality versus ranges because the rules are left to right, no skipping, stops at the first range. We'll look at ranges in the next video.