The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!Nakolus
Shorten dev cycles with branching and zero-downtime schema migrations.
Okay, staying in the slightly theoretical world, again, we're not gonna go too deep, but we're gonna look at what a B-tree actually is. This is your most common index type. It provides fast performance for your most common use cases, like strict equality, ranges, that sort of thing. And so we're gonna look at a very simple table full of users, and then we're gonna put an index on the name column, and I'm gonna show you what that structure actually looks like under the hood, how we can visualize it. And then we're gonna pretend that we're the database and we're gonna do a lookup on this index and we're gonna walk through how this thing actually works. And I think this will prime your intuition going forward, such that you kind of have an instinctual grasp of where an index might work or in what situation an index might not work. So let's take a look at this table. Here is our very simple user's table. We have 10 rows, three columns, ID name and email. We are going to put an index on name and then we're going to walk through, we're going to traverse that structure, looking for the name Jennifer. So if we put an index on name, we get a structure that looks like this. Maybe calling this a tree is generous. It kinda looks like a tree, but you can see at the top we have two names, Isaac and Steve. That is the root node. Now if we jump all the way to the bottom, you'll see across the bottom are all 10 names. And importantly, those 10 names are in order. So it starts on the left at Aaron, and then it goes all the way across to the right with Virginia, the last name in the tree. The bottom of this tree, these nodes down here at the very bottom, these are called the leaf nodes. So sticking with the tree metaphor, the thing at the very, very end, those are the leaves. That's what's at the very, very bottom. Now, the leaf nodes are interesting because they contain the data. The names that we have indexed live down in those leaf nodes. And remember, the other thing that lives down there is the CTID, which is the pointer back to the table. Okay, so where the database, we're gonna play the role of the database. And we have been issued a query that looks like this, let's start from users where name equals Jennifer. And it's a pretty easy query. But let's look at how the database actually uses the index to satisfy this query. Every tree traversal is going to start at the root node. So we have Jennifer, we look at the root node and we have Isaac on the left, and Steve on the right. Alphabetically, Jennifer falls in the middle. So we're gonna take the middle path that leads us down to this interior node here with just the name Simon. So we compare Jennifer to Simon and Jennifer alphabetically is less than Simon. And so we follow the left path of the tree down into the leaf node. And then once we're in the leaf node, we just search for the name Jennifer and there is Jennifer. Let's do this again, looking for the name Taylor. So we're gonna start at the root node again. And this time Taylor is greater than the right side of the node, which contains Steve. So because Taylor is greater than Steve, we're gonna go down the right side of the tree and then we hit a node that actually contains Taylor. And because it is equal to Taylor, we follow the right side of the tree again. And then down in the leaf nodes we find what we're looking for, which is Taylor. So what is the point of an index besides being really cool to look at and really fun to pretend that we're the database. What is the point of an index? Well, the point of an index is to find the data you're looking for very, very quickly, right? So in this case, we just traversed the tree, found the thing, hopped over to the table. If we did not have an index, what we would have to do is scan the entire table, which is called a table scan. It's all right there in the name. So when you do not have this secondary data structure, you have to go back to the heap table and look for all rows that contain Jennifer, just brute force through all the rows. Now, when you have a table that only has 10 rows in it, it might be faster just to scan the whole table because it could either scan the index and then, or rather traverse the index and then hop to the table. Or with that few rows, it might just say, Hey, I'm just gonna go read the table. That's super quick. So I'm unmoved at 10 rows. But when there are a billion rows or 10 billion rows, you don't want to be scanning 10 billion rows. You want to be traversing some number of nodes in a tree and then hopping over and grabbing the exact row that you're looking for. So B-trees, extremely useful and extremely efficient, and this is the underlying structure that makes it that way.