The most comprehensive course on PostgreSQL

Learn how indexes improve database performance by creating separate data structures that help find information quickly, such as indexing last names for faster lookups. While indexes speed up queries, they require regular maintenance since they need to be updated whenever the underlying table changes.

You made it all the way through data types, or you skipped ahead because indexes are awesome, in which case, I forgive you because I also think indexes are awesome. Indexes are, I think, maybe my number one favorite topic, I think that they're my number one favorite topic when it comes to databases. Indexes are the best way to unlock a performant database. We're gonna learn all about them. We're gonna learn a lot of practical stuff that you can go use today, we're also gonna learn a little bit of underlying theory of indexes. This is of course, for your edification, so that you become smarter, but this is also to train your intuition, because as you move forward, I don't want you to just be memorizing things. I want you to be learning things so that when new situations arise, things that we haven't talked about in this course, you have a framework of knowledge where you can be like, "According to what I know about indexes, I think this way may be the right way," and that will lead you down hopefully a shorter path to figuring it out on your own.

If that scares you, do not worry, I do not have a computer science degree, I'm not gonna bore you with a bunch of stuff that is potentially not relevant, but I do think it is helpful to have a little bit of an understanding of how databases, how their indexes work under the hood. We're gonna do all that in this module, we're gonna cover a lot of practical stuff.

Let's start with just some high level thoughts about indexes. First, and most importantly, indexes are a separate data structure, fundamentally it's separate discrete data structure from your table. While you do define an index on a table, it then creates a second data structure that is optimized for particular operations. The most common form of that data structure is a B-tree. That is what most of your indexes are going to be. In the data or in the data types video, we looked at a few GiST indexes, that is a different type of index structure.

We've got B-tree, we've got GiST, Hash, GIN, there are SP-GiST, there are several that we'll look at. The most common one is a B-tree. Now, the second thing you need to know is that it maintains a copy of part of your data. So it is fundamentally discrete from your table and it maintains a copy of part of your table. If I put an index on last name, it takes all the last names, copies them from the table, and puts them into the index in a way that it makes it easy for us to traverse the index to quickly look up somebody by last name. That may lead you to understand why people say don't create indexes for everything 'cause that's gonna slow your database down. See, this is what I'm talking about, when we understand a little bit lower level, some of these things that we've just held to be true, make a little bit more sense. If it is a fundamentally discrete data structure that maintains copy of part of our data, then every time we update that data in the table, that index needs to be maintained. If we change somebody's last name, it has to change in the index, and maybe the order of the index then has to be rearranged to make that new last name fit in in the appropriate place. That happens very quickly, but it is longer than zero seconds, it takes time.

If you had 50 indexes on different parts of last name plus other columns, you would have to touch 50 indexes and update them all with the new data. It maintains a copy of part of your data, which is why indexes require maintenance, because they require updating after the table has been updated.

The third thing that you need to know is that each index contains a pointer back to the table and where to get the full row. Now that we know that it is a fundamentally separate data structure, well, we're gonna look stuff up by index, but we are gonna need the rest of the row. Because if I look somebody up by last name, I still kind of need to know what their first name is and what their email is, and what their user ID is and all of that stuff. In most databases, how it works, this is not how it works in Postgres.

In most databases, how it works is every index, every index contains a pointer to the primary key, which is how the table is arranged. That is not how it works in Postgres. In Postgres, every index contains a pointer to the table and the physical location where that row is. What you need to remember is that every index contains a pointer back to the table, such that after you have traversed the index, you can go over to the table and grab that row super quickly without having to scan through the entire table.

That's a fundamental overview of how indexes work. What you need to remember is that it is a separate discrete data structure that maintains a copy of part of your data, which means it must be updated. Therefore there are maintenance costs to an index and it contains a pointer back to the table such that after it traverses the index, it can go back home and get the rest of the row. We're both going to go more into theory and deeper into the practical workings of indexes in the next few videos.

Full Course

PostgreSQL database platform

Summary

Video Transcript