Aaron is a natural teacher and this course is the best introduction to Postgres I have come across. Lessons are easy to follow and he recommends some great tools for working with Postgres.Joel Drumgoole
Shorten dev cycles with branching and zero-downtime schema migrations.
We're gonna look at three different types of scan node in this video. We're gonna look at index scan, bitmap index scan, and a sequential table scan.
Here is our old friend/enemy, the sequential scan on the table. What happens when you see this? Well, when you see a sequential scan, this is telling you that the query planner, which is, you know, the part that decides how this is going to work, well, let's just say Postgres. Postgres has decided, you know, I'm gonna have to read a large portion of this table. In this case it's gonna have to read the entire table, but it's decided I'm gonna have to read a large portion of this table and it makes the most sense for me to read this table in physical disc order.
Now remember when we talked way, way back at this point about how the database is arranged under the hood? It's a whole bunch of pages. In a table, there are a whole bunch of pages, and then the rows are just kind of written wherever there is space. The database has to decide, should I read all of those pages in physical disc order, which would be very fast. Because there's not a lot of random IO, you're not popping around to different pages. You're starting at the first page and reading all the way through to the last page. So in terms of IO operations, it's very fast. That's one option, or should it go to an index, find some row locations, and then go grab those directly out of those pages. Those are the kinds of things that it is weighing.
Here we see that it has chosen sequential scan on users as the best method. Let's keep going and say, let's throw a where email because I do believe if we do select * from pg_indexes where table name = users
, I just wanna confirm what we have on here. We do have an email_btree index on here using btree email. That's the one that we're gonna use. If we do where email, and this is a little weird, but it's gonna prove the point where email less than, let's try B. If we see B, there we go. If we see a bitmap index scan. I'm just gonna write down sequential, sequential. I can't spell sequential. Sequential scan there. Now we've seen bitmap index scan and then we'll see something else later. This was basically read the entire table in physical order. That's the worst one. I don't know why it's at the top, that's the worst one.
What we're seeing here is a bitmap index scan on email, btree, and then you see it's little attachment. It's a little attribute is, hey, here's what the index condition is. Not a separate node just indented, it's just showing you this is actually when I'm over in the index, this is what I'm doing. I'm checking to see this. And then that's going to emit something to this bitmap heap scan on users. And an interesting question is, what the heck is a bitmap heap scan? This is extremely interesting to me. I hope you find it interesting. This is extremely interesting to me. This is the scenario where Postgres decides, I honestly, I don't need to read the whole table. I don't need to read the whole table. I think it would be better if I went over to the index and got some information out. But the information that I need to get out of there is actually still quite large. I still do need to go check a whole bunch of pages over in the table and read out a whole bunch of rows. This is better than a sequential scan on the table, but it is, it's not the best.
Here's what's actually happening. It heads over to the index and it finds all of the emails that are less than B in this case. It finds all of those emails. And what it does is it constructs this map. It constructs a map that says, according to this index, here are all the pages and all the to pulls, all the the row addresses that match that index condition. It creates this map and then this bitmap index scan, it admits that up to the bitmap heap scan. Those two always go together. A bitmap index scan and a bitmap heap scan work together. What happens is, after it's done in the index, it has constructed this map that says, here are all the ones that matched. Good luck, have fun.
What the bitmap heap scan does is it says, all right, I got a lot of work to do here. I got a lot of work to do here. What I'm gonna do is I'm gonna put all of these locations, all of these addresses in physical order, because remember, physical order matters because we don't wanna be jumping around the disc if we can avoid it. The bitmap heap scan says, thank you for this map. Let me put it in physical order and then I'm gonna read through the table in physical order visiting all of the locations that are present on that map. The two kind of work together. You get a benefit of pretty quick, pretty quick traversal of that btree. And then reading that table in physical order to prevent just random hopping around. The bitmap index scan and the bitmap heap scan work to together all the time.
One more thing I wanna show you here. The bitmap heap scan has this attribute of recheck condition email. We're seeing that. Are we potentially checking this twice? In fact, we are. You can see this recheck condition in a few different cases. When this bitmap index scan, when it emits its bitmap, it might be lossy, it might not be perfect. That can happen with different types of indexes. Some types of indexes don't support strict equality. That's fine, I think what's happening here is when there are too many row addresses, it eventually just says there are some rows in this page. Then during the heap scan, it goes over to that page and it rechecks the condition to make sure that it only is pulling the ones where the email is less than B. That recheck condition happens over on the heap after the row has been pulled out. It says, I know that you checked this in the index, but the map you gave me wasn't super clear, so I'm gonna check these guys again. When you see a recheck condition, that's what's happening.
Potentially the node below it produced too many results in the map. The map became lossy or potentially the node below it never could produce a perfect map, and it was gonna be lossy all the time based on the type of index that it is. Now, let's write down some of these rules before we move on, but before we do that, how awesome is this? How awesome is it to fully understand how this thing works under the hood? I hope it makes you feel as powerful as it makes me feel. I love knowing this stuff, and knowing that I can look at it and say, "I know what that means. I know how I can change my query. I know how I can improve my application's performance and go home on time." That makes me feel awesome. Hopefully, you feel the same way, let's keep going.
What this does, a bitmap index scan, we decided that it scans the index, produces a map, then reads pages in physical order to prevent that random IO. Now, the reason that it produced that map is because it said "oof, that's a lot of rows. That's a lot of rows, man," so we're gonna get penalized on random IOs. Let's produce a map, sort it, and then fetch it in order. What is that worse than? So a bitmap index scan is better than a sequence scan, but worse than a what, what, what? Here is what it's worse than.
The final one is aaron.francis@example.com, and if we look at that index scan, the only node, the only node here is Index Scan. Remember, this is an attribute of that node. It says, here's the condition that we're using during that scan. It's an email equals this cast to text. The best guide down here is index scan. This is your best number one option, although there's one more that's not quite a scan that is better than that, which we've talked about. I'll tell you that in a second. But what's happening here, remember with the bitmap index scan, it went to the index, produced a map, and then went to the heap. This index scan node contains both of those things. It's not separate. It's not separate in the plan like it is for a bitmap index scan. This is saying, man, there are so few rows in this case it thinks there is one row, which it's right. There are so few rows here that I'm just gonna go read 'em in random order. I don't care about the IO penalty. There are so few rows that I'm just gonna go directly fetch them by page and tuple and bring them back. There's no map creation, there's no map sorting, there's no reading and physical order. It's just saying, we're good, I'm gonna go get them. This is scanning or traversing, scan the index, get the rows. That's it.
That's all there is to it. Now the one thing that could be better than this is that thing that we talked about quite a while back. If we do select email from users
, then that is, what is that using? That's using email hash. I wanna drop that index drop index email hash. If we drop that index and then we see index only scan, and so that is different. Let me change it back to select *
, which is an index scan, which includes a visit to the heap. But if we select only from the items that are in the index, as we talked about, this is now a covering index and it can be aided by an index only scan. It never has to visit the heap at all. Kind of wild, right? Kind of awesome.
I think the main thing that the node selection hinges upon which type of scan is how many rows it estimates it's going to have to emit to its parent. We saw in the index scan it only thought it was gonna have to emit one. That IO wasn't a penalty, in the bitmap index scan it thought it was going to have to find a whole bunch. It emitted a map that the heap, the bitmap heap scan then used moving beyond that. If it thinks it's gonna have to read a large portion of the table, it doesn't even make sense to waste time over in the index whatsoever. It goes straight for a sequential scan. That may be the absolute fastest thing possible, especially on some relatively small tables. It doesn't make sense to visit two places. It's just gonna read, it's just gonna read the entire table because it might just be faster.
Those are the different types of scan nodes that you will likely see. Hopefully that can help you debug or at least understand some of your queries.