Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Indexing
B-Tree overview

Full Course

$
349
$399
USD, one-time fee
Going through the Mastering Postgres course by Aaron Francis and holy cow is it well designed. These types of well-polished knowledge products bring me an immense amount of joy.
Adam Taylor

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

A B-tree is a commonly used index type that allows databases to search data quickly and efficiently. You'll learn how adding a B-tree index to a table—such as one containing user names—can significantly speed up lookups for values like "Jennifer" or "Taylor." This approach avoids full table scans, making it especially valuable when working with large datasets.

Video Transcript

Staying in the slightly theoretical world, again, we're not gonna go too deep, but we're gonna look at what a B-tree actually is. This is your most common index type. It provides fast performance for your most common use cases, like strict equality, ranges, that sort of thing. We're gonna look at a very simple table full of users, and then we're gonna put an index on the name column, and I'm gonna show you what that structure actually looks like under the hood, how we can visualize it. Then we're gonna pretend that we're the database and we're gonna do a lookup on this index and we're gonna walk through how this thing actually works. I think this will prime your intuition going forward, such that you kind of have an instinctual grasp of where an index might work or in what situation an index might not work.

Let's take a look at this table. Here is our very simple user's table. We have 10 rows, three columns, ID name and email. We are going to put an index on name and then we're going to walk through, we're going to traverse that structure, looking for the name Jennifer. If we put an index on name, we get a structure that looks like this. Maybe calling this a tree is generous. It kind of looks like a tree, but you can see at the top we have two names, Isaac and Steve. That is the root node.

Now if we jump all the way to the bottom, you'll see across the bottom are all 10 names. And importantly, those 10 names are in order. It starts on the left at Aaron, and then it goes all the way across to the right with Virginia, the last name in the tree. The bottom of this tree, these nodes down here at the very bottom, these are called the leaf nodes. Sticking with the tree metaphor, the thing at the very, very end, those are the leaves. That's what's at the very, very bottom.

The leaf nodes are interesting because they contain the data. The names that we have indexed live down in those leaf nodes. And remember, the other thing that lives down there is the CTID, which is the pointer back to the table. Okay, so where the database, we're gonna play the role of the database. We have been issued a query that looks like this, let's start from users where name equals Jennifer. And it's a pretty easy query.

Let's look at how the database actually uses the index to satisfy this query. Every tree traversal is going to start at the root node. We have Jennifer, we look at the root node and we have Isaac on the left, and Steve on the right. Alphabetically, Jennifer falls in the middle. We're gonna take the middle path that leads us down to this interior node here with just the name Simon. We compare Jennifer to Simon and Jennifer alphabetically is less than Simon. We follow the left path of the tree down into the leaf node. Then once we're in the leaf node, we just search for the name Jennifer and there is Jennifer.

Let's do this again, looking for the name Taylor. We're gonna start at the root node again. This time Taylor is greater than the right side of the node, which contains Steve. Because Taylor is greater than Steve, we're gonna go down the right side of the tree and then we hit a node that actually contains Taylor. Because it is equal to Taylor, we follow the right side of the tree again. Down in the leaf nodes we find what we're looking for, which is Taylor.

What is the point of an index besides being really cool to look at and really fun to pretend that we're the database. What is the point of an index? Well, the point of an index is to find the data you're looking for very, very quickly, right? In this case, we just traversed the tree, found the thing, hopped over to the table. If we did not have an index, what we would have to do is scan the entire table, which is called a table scan. It's all right there in the name.

When you do not have this secondary data structure, you have to go back to the heap table and look for all rows that contain Jennifer, just brute force through all the rows. When you have a table that only has 10 rows in it, it might be faster just to scan the whole table because it could either scan the index and then, or rather traverse the index and then hop to the table. Or with that few rows, it might just say, Hey, I'm just gonna go read the table. That's super quick. So I'm unmoved at 10 rows. When there are a billion rows or 10 billion rows, you don't want to be scanning 10 billion rows. You want to be traversing some number of nodes in a tree and then hopping over and grabbing the exact row that you're looking for.

B-trees, extremely useful and extremely efficient, and this is the underlying structure that makes it that way.