Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
In this video, we're gonna talk about views and views are similar to CTEs in that they give you a little bit of organization. It provides you a nice way to wrap up what could potentially be a gnarly query where you're covering over some data model issues or you're doing some complex joins or something like that. You can wrap all of that up into a nice neat, pretty little view. Unlike CTEs, these views do stick around. You can create a view and then reference it later just like it's a table.
In this one, we're gonna cover views, and then in the next one we'll cover materialized views, which are slightly, but importantly, quite different. The way that we've been getting our all_users is with all_users as, and then we open this up and we do select * from users, and then we union all, and we do select * from users_archive, and then we've hopped down here and done select * from all_users, and that works. This works totally fine and it's great, it works wonderful if you're kind of doing this on the fly each time, and maybe you don't need to be using all_users over and over again across, you know, many days, weeks, months, years, forever, that sort of thing, and so you can just kind of organize your CTE that way.
If all_users is something that you are using a lot of the time and you kind of want it to stick around, well, we can make that a view. We're just gonna say instead of with all_users as, we're gonna do create view all_users as, and then we're gonna throw a semicolon there because this is the complete statement here. We'll run that as create view all_users as, now we can get rid of that and we can just do select * from all_users where email = aaron.francis@example.com.
If we run that, we're looking across archived and not archived users and we can prove that by saying where deleted_at is not null, and we'll just do a limit 1 on that, and we get this first person is deleted. What we've done here is we have basically created a query and given it a name and we can reference it again. What we haven't done here is created a temporary table or stored or duplicated this data in any way, shape or form. When we say select * from users, when we say select * from all_users, what's actually happening here is if we do and explain, you'll see it's still doing the scans on users archive and users and then throwing 'em together and limiting it down. We haven't actually combined the data. What we've done is we have, basically, given a big gnarly query in here, we've given it a name and that name is all_users. Instead of having to type out the query, we get to call it by name, but importantly, it is still being run at the time that we are running this query. We're not getting necessarily any performance benefits from this because it's still running the underlying query. What we're getting is a big quality of life improvement such that we can just write all_users and our all_users query is relatively simple.
You can imagine when you're doing a bunch of joins or you're renaming columns to match, maybe your application code is not up to date with the current database schema or vice versa and you're trying to bridge the gap as you're doing this multi-step migration, you can imagine creating a view that kind of papers over some of those data or those schema inconsistencies, and you can just reference the view until everything, all that migration is done, and then you can stop referencing the view and go back to referencing the table. That's totally fine.
In fact, we can do something like this. Let's imagine we have decided we're gonna put the users' table and the users' archive table back together. We made a terrible mistake, that happens. That's totally fine. We can fix it here first, and then we'll fix it in the actual data later. What we can do is if we did show search_path we are simply looking in the public schema. That's the only place we're looking for tables or literally anything is the public schema. What we can do is we can create a new schema called views. We're gonna create a new schema called views, and then we're gonna set our search_path = views,public. Anytime Postgres comes across an unqualified table name, it's gonna start by looking in the views schema before moving on to the public schema. Now there's nothing in the views schema. If we did select * from users limit 10; we're still gonna get the users' table just like we always have. And if we did archive, we're still gonna get the archive table just like we always have.
Let's go about putting them back together and pretending as if the users' table was already unified. We can do this, we can do create, create view, views.users. Remember the views schema takes priority, takes precedent, search precedent over the public schema. This is the one that's gonna be found first. So create views.users as, and then, boy, are we used to this by now. Select * from, and we gotta put our public prefix on there, public.users, and then we're gonna copy you, and we're gonna come down here and we're gonna say archive, and we've done this before. I think that is all we need to do, view views.users. We're giving it the schema name, then the view name, we said the keyword, we have our union here. That looks right. Whew, that looks right to me. Now if we did select * from just straight up users where deleted_at is not null, we should see... There you go, so we have, our user's table has been put back together, kind of we're pretending it's been put back together.
If we were to force it to search the actual underlying users' table rather than the views, we would see... "Oh no, they're still split." But if you don't qualify it, it's gonna look at that view first, and we can prove that by saying explain, and you'll see that it's doing the scan on the users' archive and the scan on the users as well. We are in fact assured that this, right here, is just a reference to that union all view that we wrote earlier, but now it has that pretty name. We can go about, you know, maybe doing this migration or whatever. But from the point of view, as of anyone who's writing this queries including the application, if the search path is set correctly, whenever they do this, they're gonna see one unified table.
Views are a very nice way to organize some queries, to clean up some messy data models. The data model doesn't even have to be messy, honestly. You might have a very complex set of joins that is required for one report or one person and you think I'm just gonna kind of solidify that. I'm going to commit that into a proper first class object called a view. You can just reference the view like a table do everything else. You can create views that are based on views. You can create CTEs that are based on views. You can basically do everything, but wrapping it up in a view is nice.
Now remember, this is just a regular view. We haven't gotten to materialized views yet. And with a regular view, the underlying query is always going to be run, which is great because it's gonna keep the data up to date, but it's bad if the underlying query is terribly expensive because you don't get any performance benefits from a traditional view. To get those read time performance benefits, we have to do a materialized view, which is what we're gonna do next.