Mastering Postgres is such a crisp way to learn this subject. With today’s AI tools writing many of the queries via autocomplete it’s vital to have this knowledge to ensure you know what’s actually happening under the hood. My app’s performance improved overnight after learning more about indexing. Aaron’s excitement for the subject makes it really fun to learn whether I’m next to my code editor or on a treadmill run.Peter Ramsing
Shorten dev cycles with branching and zero-downtime schema migrations.
We're going to start by reading a very simple one line explain plan, and then we're going to add some filters and conditions and stuff to build up our knowledge as we go.
We have a select * from users here, and we're just gonna throw explain on it. This is an explain plan. This is a one line plan because there's only one thing that needs to be done and that's scan the whole table. We'll talk about these different types of scans and all of these numbers here in a little bit. But I want to show you one other thing before we do. Okay, so now we get to see a little bit of structure.
An explain plan is a tree plan that is made up of nodes and these scan nodes down here, these are the ones that produce rows and there are a couple different kinds of scan nodes that produce rows. There are some other kinds of nodes that produce rows as well. But the important thing we need to understand here is how do we read this tree structure? I feel like this is a good example because we know that we can't read it from top to bottom, right? We can't read it from top to bottom because we can't limit before we scan, like we need to scan and produce all of the rows and then we can limit it. You do need to read it not from bottom up necessarily, but from inside out.
Here, this is indented by this arrow and it says, all right, down here, this is a sequential scan on users and it's going to pass up a number of rows. It thinks based on statistics, it's gonna pass up 989,908 rows, which is the entire table. Then it's going to pass it up and out to the limit node. That one is going to pass up and out 10 node or 10 rows back, which is that row there. If we were to try to see this a little bit differently, because this format is a little bit wonky in my opinion, you can do a format JSON, which is super hard to read as a human, but very machine-parsible, which might be handy if you need it. But it does make one thing clear that this is a tree structure. You see here we have the plan and then the node type is limit. That was our very top node. That is the one that is responsible for returning the rows at the end of the query. Then inside of this node we have plans in and of itself. We have this tree structure going on and inside the plans we have a sequential scan down here. Then you get those same stats back, the cost and the rows and the width, which we'll talk about later. But this does show you that instead of reading it top to bottom, you kind of have to operate on the inside and then pass it up and out to the outside.
Now reading these things takes a little bit of time to get used to and confusingly, there's another type of indented row that you might see, but it is not indented with an arrow. Here I'm gonna remove format JSON, and we run that again, you see we have our top level node, which is limit, which is responsible for giving us all of the rows back in this case, just 10. And then underneath it, it has a sequential scan, which is responsible for giving the rows to the limit node.
If I were to add a where first_name='Aaron' on here, we do see another indented row, but it is not indented by an arrow. Let us use our format JSON again to see if this is a little bit more clear in JSON versus just random indentations down here. We'll take this and we'll come out here and then let's make some space for ourselves. You see the limit does have a sub node of sequential scan, but sequential scan does not have a sub node of filter. It just has an attribute. It just has a another attribute telling us that this sequential scan has something you should know about. The arrow indicates this is a new node, a new type of node, something fully discreet, and just that little indentation tells us, Hey, this is probably something you should care about with regard to the node right above me. It is not a child. It is merely an attribute of that node. Okay, that's our first toe in the water of explaining explain plans, which is gonna be fun to say it.
It is a hierarchical structure, which is easy to see on the human readable version, but is maybe better represented in the JSON version. If you kind of forget how this is actually structured, throw a format JSON on there, refresh your memory that there are parent and child nodes, and then that'll make the human readable version more understandable to you moving forward.
Do remember that you don't read it from top to bottom. The top row, the very first thing you see is in fact the very last thing before the rows get returned. You have to read it kind of from inside out moving up because those child nodes emit the rows to their parent nodes all the way up, all the way up, all the way up until it reaches that last node, which in our case was doing a limit and then emitting those 10 rows back, there's a lot more to learn here. Let's keep going.