Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
Do you remember when I told you that an index contains a pointer back to the table, such that when you look something up by index, you can grab the pointer, walk your way over to the table, grab the rest of the row and then give that back to the user, the person that asked for the result of that query.
We've gotta figure out what is that connection between the separate data structure and the rest of the data that we likely need, i.e., the table. We're gonna investigate that thread that binds the two, but before we do, we gotta figure out how does Postgres store rows under the hood? How does it write the data to the disc? And I'm gonna keep it relatively high level, but this is pretty important.
Under the hood, Postgres has a bunch of pages. It's a bunch of equal sized blocks and in those pages there are rows and so you have a page. You might have page 0, 1, 2, 3, 4 and inside those pages, that are equal-sized chunks of data, you've got positions. You have these positions where the rows are. You might have page zero, row 10 and that is a discreet, unique identifier, such that if you were to see that identifier in an index, you could walk over to the table, walk to page 0, go down to row 10 and you would have it immediately and so it's a very quick way to look something up. When you say, alright, this database is thousands of pages, but I know that I need to go to page 864 and grab row 12. I can do that pretty quickly and so this mechanism, this is the underlying storage arrangement, but the mechanism by which these rows are written or the structure that these rows exist in is just a heap and it's a pretty good name because it's just like a pile. It's just like put the rows wherever there is space. It might be on page 0, it might be on page 60. Just put the rows wherever you can and that makes inserts really, really fast 'cause all it's doing is looking for some blank space where it can write your name, right?
I do want to show you that these ctids exist and then we'll talk about 'em for just a second more. If we did select * from reservations
. This is an old table I just had lying around from the exclusion constraint video, so * you think means everything, but in fact it doesn't. There are some system tables that are hidden, one of which is ctid. For this table, in the zeroth page in position 1 exists this entire row. In the zeroth page, position 2 exists this entire row. That is where it physically exists on disc. There aren't many rows in this table. Now you can look up by ctid, where ctid = (0, 2). Show me from the zeroth page, the second row. You can do that, however, don't. Don't do that. These ctids can and will change, so if you update this row and perhaps some of your variable data columns get a lot bigger, there might not be space and so it's like, 'Ah, I'm gonna take it from page 0 to page 84 'cause there's a ton of space on page 84." And so if you were relying on (0,2), you're hosed. It's game over.
When Postgres vacuums or when you vacuum the database, it's gonna rearrange all those rows, such that those ctids all change. They're not primary keys. They're not stable. They're not deterministic, I guess. They change. They're volatile. They exist. I wouldn't rely on them and I wouldn't use them.
Now this leads us to our final and maybe only relevant point. Every index contains the ctid such that it can get back to the table and immediately know exactly where it's going to get the rest of the row data. When I say every index contains a pointer, now we can say yes, that's true. Every index, however, contains the ctid because that is the actual pointer that gets you back to the table to find the rest of your data.