Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
We're gonna use a recursive CTE to generate or to traverse some hierarchical data. We have a table full of categories and every category potentially has a parent category that lives in that same table. We're gonna find our path through that table with this recursive CTE.
Here is the data that we're working with. It's only 15 rows, but you get a sense of the structure. So this is a bunch of categories. We're gonna start down here at the Canon camera because this is a Canon camera right here. And the parent ID of Canon cameras, or is cameras, just generally cameras. The parent ID of cameras is 11, which is generally video gear. Then the parent of video gear is just broadly, electronics.
We wanna map our way through that to show the path to each individual child. What we're gonna do is we're gonna start with recursive. With recursive all categories as, and we'll open that up. We're gonna start with the most parent of parents, the parentest of them all. That is select * from categories where parent ID is, no, these are the grandparents, the great, great, great, great great grandparents. The ones that have no parents above them. This is our root node or our anchor condition that generates these root nodes.
Then we're gonna say union all of something. And we're gonna say select * from all categories. Now what is the something? The something is interesting because we're gonna do select * from, let's drop this down because we'll need the space later, from all categories, which is in fact, the table that we're building up, which at the moment looks like this. That is what the table looks, that's what the all categories recursive CTE looks like when we reach this first recursive condition.
At this point, this is what all categories looks like. Let's keep that in mind, and we are going to enter, join categories. We're gonna bring in categories on all categories, which at this point is just this row. All_categories.id equals categories.parent_id. What we're saying is, all right, we've got this one node here and let's bring together all of its children. Anyone that has a parent ID of one, we're pairing up with this row that has an ID of one. If we run this, it's not gonna work because must have the same number of columns, which is correct, in fact, and because we're joining stuff in, we're getting a much wider result set down here, which is fine 'cause we don't actually want all of the columns. Let's do this, let's do ID and name. That's a good place to start, it's not a good place to end. Then when we join this stuff in, we don't want to reselect or reshow the data from all categories because that is, you know, represented in the rows above.
What we're gonna do is we're just gonna select the data that we joined in by saying categories, ID and ccategories.name. And now if we run that, nothing, I mean it worked, it didn't break. But that's nothing. Let's keep moving because that is good that it didn't break. That's always a good start. But that's not, oh, it's not anything. We need to generate, we need to generate this path through all of the categories.
Let's do that, we're gonna start with our anchor condition, and say name as path. And then down here we'll just say, we'll just say foo. Foo does not exist. Why do I always do that? We're gonna start with foo, so there we go. The anchor condition has name as the path. What we're gonna do down here, in fact, is we're going to concat the path, which means the rows above us, we're gonna take path. We're gonna put this little arrow in here. We're gonna say we need categories, which is the one we've joined in, which is the child, categories.name. That should give us with any luck, hey, look at that. There we go, so coming back down to Canon, we see it goes from electronics to video gear to cameras to Canon. The first run through, we just had that one. The second run through, we got all of those children. Then the third run through, we got the third level or the third depth, and then the fourth run through, we got the fourth depth of information here.
We can prove that by just saying one as depth and then coming over here and saying depth plus one. If we run that, you'll see, there is our depth. That's the first run. That is the second run, that's the third run. That is the fourth run. Pretty cool, right? It's a little mind bendy but hey, we're developers. We do love a little bit of recursion. It makes us think very, very hard. I love this as a pattern, especially when you have a table like this that is a little bit self-referential in terms of who the parent is. This is a great way to generate those trees.
Of course, you don't have to do it with this goofy little arrow syntax that I've done. You could put it in an array, you could put it in a JSON object, you could do it however you want. And of course, you could do this logic on the application side. And that's always kind of the trade off is where do you want do this kind of work? I think in this case, I would do this in the database. The database has the affordances to do that. It's going to work very well, usually very performantly versus bringing all of the data over into your programming language of choice and then parsing it or looping it over there, that is what recursive CTEs are built for.
I would use the tools available to you, and in this case, I think recursive CTE is the right one.