Mastering Postgres is such a crisp way to learn this subject. With today’s AI tools writing many of the queries via autocomplete it’s vital to have this knowledge to ensure you know what’s actually happening under the hood. My app’s performance improved overnight after learning more about indexing. Aaron’s excitement for the subject makes it really fun to learn whether I’m next to my code editor or on a treadmill run.Peter Ramsing
Shorten dev cycles with branching and zero-downtime schema migrations.
A lateral join is also not in fact a different type of join. It's rather just a qualifier for existing joins. You can have a left join lateral, inner join lateral, cross join lateral. What the lateral keyword tells Postgres is for every, for every row in the preceding table, run this subquery and use this result set. Did you follow that? For every row in the preceding table, run this subquery once per row. You do have to be careful.
Let's take a look at how you might actually use this. We're gonna stick with users and bookmarks. Because we all have this memorized, I'm just going to paste it in. Select * from users, left join bookmarks on users ID, bookmark ID, limit 10. There we go. Kyleigh has a lot, and Waylon is next. The big boss man comes to you and says, "Hey, we want to do, we need a report of everybody's latest bookmark," and you think, there are a couple ways I can do this. You're right, there are a couple of ways you can do this. We're gonna look at doing this with a lateral join. Later on, we might use a CTE to do that, but we haven't covered CTEs yet, so we're gonna use a lateral join.
What we're gonna do here is we're gonna transform this into a subquery. We've already done that, so that feels good. We're gonna do subquery to get top bookmark per user. The easiest way to do that would be to do something like this. What if we were to say... Let's do select * from bookmarks where user ID equals, and then we'll just reference the outer table, users.id order by ID descending, limit one.
Let's take this, I like to do this a lot. You might have noticed this by now. I like to take these query parts and decompose them and see if I have gotten close. Let's take, we'll take Kyleigh here and replace this with that and see if we got close. That seems pretty good. Let's do limit 10. Yes, we are in fact selecting their most recent bookmark by ordering by ID descending. Not foolproof, but we don't have a created at or an updated at column in this table. We're gonna order by ID and call it a day. I like that, I tested that. That feels good to me. We do need to give this a, we need to give this an alias as we talked about. We'll say as most recent bookmark. If we run this, we're gonna have a few errors. We get this first error here, syntax error, near limit, which usually means right before limit, something is going wrong, and we haven't given it a join condition.
In this case we're just gonna say on true because we're actually constraining the result set inside of this subquery. Anything that comes back, definitionally should be joined because we already did our condition inside of here.
Let's see if that works. Oh, that doesn't work at all. Invalid reference to from clause entry table users. Here we go, look at that. To reference that table, you must mark this subquery with lateral. Postgres is telling us here that we have referenced a table incorrectly. We cannot reference the preceding table inside of this subquery unless we mark it as lateral, because then what happens when we mark it as lateral is for every row in the user's table, it's gonna run this subquery and generate the top one bookmark for that particular user.
Let's see that. Left join lateral. If we run that, we see we get the top bookmark or the most recent bookmark per user. Let's trim this down a little bit, and we'll do users ID and we can just call it first name and we'll do URL as well. That feels like enough. We get one user because we're limiting this to one. We get one user, and they get a bookmark. Now, it might be true that some of these people don't have, don't have any bookmarks whatsoever. We can see that by saying where users.id... Nope, wrong one. Where bookmarks... Nope, wrong one. Where most_recent_bookmarks.id is null. If we run that, because we did a left join lateral on some cases, for example, on Gerald's case here, this query was run. In fact, why not?
Let's just do it. Let's run this query for Gerald and say we're gonna hard code this in there. It's a null result set. We did a left join, so Gerald is still going to show up. Well, he's gone now. Gerald's still gonna show up in the output. This is an interesting side note. When you don't order, when you don't order it explicitly, Postgres gets to decide what order to return it in. It is non-deterministic. That's where Gerald went. We don't have an order on this query, and so Postgres just says, "All right, not my problem. I'm gonna give you back the rows however I want." If you need to rely on order, you must give it an order. Otherwise it's non-deterministic. Interesting little tidbit.
If we were to change this to inner join lateral, you guessed it, it's gonna search and search and search and never find an answer because there are no, there are no people with missing bookmarks because we declared it specifically as an inner join. Now, as you can probably guess, a lateral join can be terribly expensive. Depending on how big the preceding table is and how expensive the subquery is, you are going to be running that subquery once per row. If you're running it a million times, that's gonna get kind of wild. But there are other ways to solve this specific problem, and we will look at that when we do CTEs.
Sometimes a lateral joint is exactly the right answer, and you shouldn't be afraid to use it. You should just be cautious about the performance and test it to make sure that it works for your use case.