Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
We're gonna keep going on joins, but instead of introducing a new type of join, we're gonna introduce a new type of table, which is not really a table, but it's a new thing that you can join in. That is a subquery.
We've been doing users and bookmarks, and those are proper tables, and we've been joining those together. Now we're gonna join users with a query that queries the bookmarks table, but eliminates a bunch of stuff that we're not interested in seeing. We're going to join a proper table to the result of a subquery, which can be really powerful.
Here's the scenarios, we have select * from users. We're just gonna set this up like we have historically left join bookmarks on users.id equals bookmarks.user_id, and we are going to limit that down to 10. And if we run that, we see here's all the stuff that we're looking for. What I actually want to see is I only want to see bookmarks that are secure, https bookmarks, why? I don't know, I don't know, seems awesome, so I'm gonna do that. What we're gonna do is instead of selecting from the bookmarks table, we're gonna select from a bookmarks subquery.
We can use subqueries in a few different ways. One is to produce a set of results against which we will join, that's totally fine. Another way that we'll look at in the future that you can use a subquery is to filter out records without joining the corresponding table in. That is really, really nice when you don't need to bring along all of this extra data that would apply in the scenario of saying, "Show me all of the people "who have bookmarks that are secure." But in this case, we want to say, "Show me all of the people and their secure bookmarks." We're gonna join that in together. As we're moving forward, we're gonna start to combine our knowledge a little more. We've been kind of operating in discrete chunks, and we're gonna start to put a bunch of stuff together now.
Let's start by doing this. Let's do create index bookmarks_secure_url on bookmarks. We're gonna create a functional index here. We're gonna create an index on an expression, what is the expression? Let's find out. We'll do select url and starts_with, and we're gonna test to see if the url starts with https, and we'll limit that down to, let's just do a hundred just to see if there are any in there. What did I do wrong? Oh, I didn't add the table, from bookmarks. If we run that http, there we go, https starts with, and so that comes out to true. This is the way that we're going to determine if a URL is secure. We can yoink this right here, that's mine. We'll come up here and put that inside the parentheses. Remember, if you were to type something right here, that's a column reference. If you were to open another set of parentheses, that's a function reference, which is what we're after. We're gonna put starts with url https in there, and yes, that looks good to me. That's gonna create that functional index.
We can say select * from, you know what, I should add, we should make that a compound index. Let's do drop index, let's drop that. We're gonna make this a compound index, which you can do with a functional index part. We're gonna make it a compound index on user_id and then the functional index part. I could rename this, but I don't want to do that right now. Let's do select * from bookmarks where starts with, let me get that back, starts with is true limit 10, there we go. We have all of our secure bookmarks there. If we explain that, we see sequence scan on bookmarks, which seems like it might be bad, but in this case, we actually expect that because we put the functional index part second with the user ID blocking it.
Left to right, no skipping, it can't get past that user ID. But when we move it up into the join, that whole thing is gonna be unlocked 'cause that user ID will be used. Even though this feels bad for this query, it's gonna work for the join, so let's carry on. Instead of joining on this table, we're gonna open up a set of parentheses, and we're gonna come in here and we're going to write our subquery. We can just, we've already done it, so we can take that without the limit. Come in here, paste that. But now we need to give, we need to give this subquery what's called an alias. This has returned a set of rows, but it's, at this point, kind of dissociated from the table. We need to give it an identifier. And in this case, we can say, you can say whatever you want, as bookmarks_secure. We're gonna call this bookmarks_secure, and then we're gonna change that to bookmarks_secure.
You do not have to write as, a lot of people don't. I don't know why, I guess I really like the clarity and the explicitness of that. We're saying select from users, left join, based on this result set, call it bookmark_secure. And this is your join condition. If we run that, let's limit that down a little bit to just first name and url is probably fine. Let's throw id in just for giggles, id is ambiguous, shouldn't have done that. All right, first name and url, and all of these URLs are secure. Let's just go out to a hundred to make sure Ali has two, Jason has a couple, yeah, they're all secure.
You'll notice that everyone is showing up even if they don't have a secure URL because we did a left join. If we did an inner join, which is totally legit, you'll see that we don't have any more users with null URLs. Using this method, we were able to eliminate a bunch of data before we performed the join. You could, of course, join everything together and then filter out the rows that you don't want. But in this case, we were able to do that before we created that explosion of rows by joining the users with the bookmarks that we don't even care about.
Now for your edification, we're gonna run explain on this to see if that index worked. We will hop up here, throw and explain on it. Hopefully, we see index scan on bookmarks_secure_url right there. And remember, if we hop down here and try to run that, you'll see we're doing a sequential scan on bookmarks. That index did help us out. That composite index where one part was a column and one part was a functional expression. It did help us out in that case.