Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Indexing
Heaps and CTIDs

Full Course

$
349
$399
USD, one-time fee
The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!
Nakolus
Nakolus

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

PostgreSQL uses indexes to speed up data retrieval by pointing directly to specific pages and rows on disk. You'll learn how data is organized into pages and identified by `ctid` values, which act as precise row locators. While `ctid`s play a key role behind the scenes, it's best not to use them directly—understanding how indexes leverage them is what leads to efficient and reliable querying.

Video Transcript

Do you remember when I told you that an index contains a pointer back to the table, such that when you look something up by index, you can grab the pointer, walk your way over to the table, grab the rest of the row and then give that back to the user, the person that asked for the result of that query.

We've gotta figure out what is that connection between the separate data structure and the rest of the data that we likely need, i.e., the table. We're gonna investigate that thread that binds the two, but before we do, we gotta figure out how does Postgres store rows under the hood? How does it write the data to the disc? And I'm gonna keep it relatively high level, but this is pretty important.

Under the hood, Postgres has a bunch of pages. It's a bunch of equal sized blocks and in those pages there are rows and so you have a page. You might have page 0, 1, 2, 3, 4 and inside those pages, that are equal-sized chunks of data, you've got positions. You have these positions where the rows are. You might have page zero, row 10 and that is a discreet, unique identifier, such that if you were to see that identifier in an index, you could walk over to the table, walk to page 0, go down to row 10 and you would have it immediately and so it's a very quick way to look something up. When you say, alright, this database is thousands of pages, but I know that I need to go to page 864 and grab row 12. I can do that pretty quickly and so this mechanism, this is the underlying storage arrangement, but the mechanism by which these rows are written or the structure that these rows exist in is just a heap and it's a pretty good name because it's just like a pile. It's just like put the rows wherever there is space. It might be on page 0, it might be on page 60. Just put the rows wherever you can and that makes inserts really, really fast 'cause all it's doing is looking for some blank space where it can write your name, right?

I do want to show you that these ctids exist and then we'll talk about 'em for just a second more. If we did select * from reservations. This is an old table I just had lying around from the exclusion constraint video, so * you think means everything, but in fact it doesn't. There are some system tables that are hidden, one of which is ctid. For this table, in the zeroth page in position 1 exists this entire row. In the zeroth page, position 2 exists this entire row. That is where it physically exists on disc. There aren't many rows in this table. Now you can look up by ctid, where ctid = (0, 2). Show me from the zeroth page, the second row. You can do that, however, don't. Don't do that. These ctids can and will change, so if you update this row and perhaps some of your variable data columns get a lot bigger, there might not be space and so it's like, 'Ah, I'm gonna take it from page 0 to page 84 'cause there's a ton of space on page 84." And so if you were relying on (0,2), you're hosed. It's game over.

When Postgres vacuums or when you vacuum the database, it's gonna rearrange all those rows, such that those ctids all change. They're not primary keys. They're not stable. They're not deterministic, I guess. They change. They're volatile. They exist. I wouldn't rely on them and I wouldn't use them.

Now this leads us to our final and maybe only relevant point. Every index contains the ctid such that it can get back to the table and immediately know exactly where it's going to get the rest of the row data. When I say every index contains a pointer, now we can say yes, that's true. Every index, however, contains the ctid because that is the actual pointer that gets you back to the table to find the rest of your data.