The most comprehensive course on PostgreSQL

A B-tree is a commonly used index type that allows databases to search data quickly and efficiently. You'll learn how adding a B-tree index to a table—such as one containing user names—can significantly speed up lookups for values like "Jennifer" or "Taylor." This approach avoids full table scans, making it especially valuable when working with large datasets.

Staying in the slightly theoretical world, again, we're not gonna go too deep, but we're gonna look at what a B-tree actually is. This is your most common index type. It provides fast performance for your most common use cases, like strict equality, ranges, that sort of thing. We're gonna look at a very simple table full of users, and then we're gonna put an index on the name column, and I'm gonna show you what that structure actually looks like under the hood, how we can visualize it. Then we're gonna pretend that we're the database and we're gonna do a lookup on this index and we're gonna walk through how this thing actually works. I think this will prime your intuition going forward, such that you kind of have an instinctual grasp of where an index might work or in what situation an index might not work.

Let's take a look at this table. Here is our very simple user's table. We have 10 rows, three columns, ID name and email. We are going to put an index on name and then we're going to walk through, we're going to traverse that structure, looking for the name Jennifer. If we put an index on name, we get a structure that looks like this. Maybe calling this a tree is generous. It kind of looks like a tree, but you can see at the top we have two names, Isaac and Steve. That is the root node.

Now if we jump all the way to the bottom, you'll see across the bottom are all 10 names. And importantly, those 10 names are in order. It starts on the left at Aaron, and then it goes all the way across to the right with Virginia, the last name in the tree. The bottom of this tree, these nodes down here at the very bottom, these are called the leaf nodes. Sticking with the tree metaphor, the thing at the very, very end, those are the leaves. That's what's at the very, very bottom.

The leaf nodes are interesting because they contain the data. The names that we have indexed live down in those leaf nodes. And remember, the other thing that lives down there is the CTID, which is the pointer back to the table. Okay, so where the database, we're gonna play the role of the database. We have been issued a query that looks like this, let's start from users where name equals Jennifer. And it's a pretty easy query.

Let's look at how the database actually uses the index to satisfy this query. Every tree traversal is going to start at the root node. We have Jennifer, we look at the root node and we have Isaac on the left, and Steve on the right. Alphabetically, Jennifer falls in the middle. We're gonna take the middle path that leads us down to this interior node here with just the name Simon. We compare Jennifer to Simon and Jennifer alphabetically is less than Simon. We follow the left path of the tree down into the leaf node. Then once we're in the leaf node, we just search for the name Jennifer and there is Jennifer.

Let's do this again, looking for the name Taylor. We're gonna start at the root node again. This time Taylor is greater than the right side of the node, which contains Steve. Because Taylor is greater than Steve, we're gonna go down the right side of the tree and then we hit a node that actually contains Taylor. Because it is equal to Taylor, we follow the right side of the tree again. Down in the leaf nodes we find what we're looking for, which is Taylor.

What is the point of an index besides being really cool to look at and really fun to pretend that we're the database. What is the point of an index? Well, the point of an index is to find the data you're looking for very, very quickly, right? In this case, we just traversed the tree, found the thing, hopped over to the table. If we did not have an index, what we would have to do is scan the entire table, which is called a table scan. It's all right there in the name.

When you do not have this secondary data structure, you have to go back to the heap table and look for all rows that contain Jennifer, just brute force through all the rows. When you have a table that only has 10 rows in it, it might be faster just to scan the whole table because it could either scan the index and then, or rather traverse the index and then hop to the table. Or with that few rows, it might just say, Hey, I'm just gonna go read the table. That's super quick. So I'm unmoved at 10 rows. When there are a billion rows or 10 billion rows, you don't want to be scanning 10 billion rows. You want to be traversing some number of nodes in a tree and then hopping over and grabbing the exact row that you're looking for.

B-trees, extremely useful and extremely efficient, and this is the underlying structure that makes it that way.

Full Course

PostgreSQL database platform

Summary

Video Transcript