Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
For each node on the query plan, there is a set of parentheses and then a bunch of information that's really hard to decipher. That's the information we're gonna look at now, and there's a quick way that you can jog your memory as to what that information is.
Explain select *. We see cost equals zero..23,000, rows 989, width 76. So not the most helpful information. Let's try format JSON and see if we get any more actual help here. We see that the cost is in fact broken down into two pieces. The startup cost is zero and the total cost is that 23,000. In the non-JSON format, let's just, let's run that again and now we can compare. This first value must be the startup cost, which I'll explain in a second. Then the dot is just truly bizarre syntax, but everybody does that. That leads us to the total cost of 23,000 units, which units we'll talk about in a second as well. Then the planned rows and the planned width. The width is how wide in bites each row is going to be when it's emitted up. At this point, how wide it assumes or estimates it's going to be. If we were to change this from, we can get rid of you.
If we were to change this from select * to select ID, we'll see that width go down to eight because you know exactly why, because ID is a big int, which is an eight byte field. It's all coming together. Okay, we have two costs here. We have the startup cost and the total cost. The total cost of a node includes the costs of its children and the total cost at the very top node. That is the total cost of the query that the planner is trying to bring down. It might try a bunch of different routes as it's thinking, how can I get the data that they asked for? And while it's doing that, it's calculating a cost. Then whichever plan it comes up with that has the lowest cost, that's the one that it goes with.
These cost units are... They're arbitrary, but they're consistent. I can't say it's milliseconds or I can't say anything. It's just units. This is the cost unit. It's made up of a couple of different factors or different pieces. The cost to read a sequential page, the cost to read a random page, some CPU costs. It's made up of several different things and those are all tuneable. That's a little bit outside the scope of what we're doing here. But what we want to see here is we want to see those costs go down as we improve our query.
We have the startup cost that was the total cost that we talked about. Let's look at the startup cost. You'll see the startup cost here is zero. In the JSON, it's called startup cost. Here it's just the first value and the cost equals, the total cost is in fact that 23,000. What's going on here is the startup cost, this first value is how long before this node can get started. In this case, this is the first and only node, and so it can start right away.
Let's clear this guy out and look at something that we looked at before where email is less than b. If we look at that, you'll see this bitmap index scan on email_btree has a startup cost of zero, meaning this is the first one to go. Like I said, you have to start kind of inside and read up into the outside. This is the child node of this node up here. It gets started first and it thinks its total cost will be 3,300 units. Then just look at that. The parent node has a startup cost of 3,300 units. Imagine that. Not a coincidence because this guy is waiting around for this child node to finish and emit the map to it. Then the total cost up here, which is inclusive of its children, which in this case is just this node. The total cost up here is 18,000 units. These costs can be very directionally helpful. I wouldn't necessarily say 18,000 units is bad, 2000 units is good. I would just say, "Okay, well if we're optimizing a query, we're starting at 18,000 units, let's see what we can do."
Let's look at each individual part of this query and see if we can get its cost down, thereby lowering the total overall cost of this query. The next thing that we're gonna look at is this rows parameter. This one's a little bit tricky or not absolutely clear what it is. Let me show you this one. We can stick with this plan and if we look down here, we're gonna see we row equal 108,889. That is not the number of rows that need to be visited or inspected or the amount of work it thinks it's going to have to do. That is the amount of rows it thinks it will emit to the parent when it is done. Maybe that's not confusing to me, but I kind of thought that's the number of rows that it has to investigate. No, it might investigate order of magnitude more simply to deliver that one up to the parent node.
If we look up at the parent node, it too thinks it's going to have to deliver a 108,889, which means this recheck condition did not eliminate any. This bitmap index scan was in fact perfect because it didn't munge up any of the data and we didn't have to recheck and then filter out. The rows tells you how many you are emitting up.
One thing that you can look for here is if you expect to end up with just a few rows at the very end, but you're constantly emitting up millions and millions of rows, maybe there's a better way to limit the amount of rows that are being worked on the inner nodes before it gets finally out to the outer node. That's just a possibility because if you're carrying around rows that you don't need, you're doing extra work that you don't need. That's how that can directionally help you optimize a query.
Finally, this last thing we touched on just briefly is the width. If we were to change this to id, it does go back down to zero and that shows you how big these rows that are being returned are. That can be very helpful not only for processing here, but for sending it back over the wire. If you've got these giant JSON blobs or text fields that you don't actually need and you see that the width is huge and you're sending back tons and tons of data, that can be a good indication that you might want to look at slimming down what you select. I lied. I told you that was one last thing.
There's one more last thing that I want to show you. We're gonna switch this back to select *
, and we're gonna throw a limit on here and we're gonna see something interesting, I think. I told you that this top level node is inclusive or it is a sum of all the costs underneath it. But we do see a cost down here of 26,000 and a cost up here of 2.39. That is because this can end early because of the limit. In this case, we do not have an order at all. This sequence scan can just simply stop after 10 rows. It can just say, "Oh, you don't care what order they're in. I'm gonna read the first 10 and give it back to you. I'm not gonna scan the whole table for no reason." When you do add a limit on there, that can throw off some of those rules that I told you about, the top level node being inclusive of all of its children nodes.
But that is why that is happening. If you needed this in, let's say, order by last name. You would see that cost explode again because it does have to scan the entire table, then put it in order, and then limit it down. It's not necessarily true that a limit is going to decrease your cost. It is true that a limit will decrease your cost if the query can end early, which is not always the case. It can only end early if it's able to read the things in order or if it doesn't need an order at all, which is the case here.