Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Generating Results
Combining queries

Full Course

$
349
$399
USD, one-time fee
After watching the free lessons, buying the course was a no-brainer. It gave me real insight, so I actually understand why I’m doing something, not just how.
Jörn
Jörn

The PostgreSQL Development Platform

Start your project with a Postgres database, Authentication, instant APIs, Edge Functions, Realtime subscriptions, Storage, and Vector embeddings.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Supabase (coming soon)

Summary

Learn how to use `UNION`, `INTERSECT`, and `EXCEPT` to combine the results of two different queries vertically, rather than side-by-side. The video covers using `UNION` to merge data from active and archived user tables, enabling easy search across both without cluttering active data. You'll also explore how `INTERSECT` identifies common results between two sets, and how `EXCEPT` returns results from the first set that aren't in the second. Additionally, we discuss the performance benefits of using `UNION ALL`, `INTERSECT ALL`, and `EXCEPT ALL` when handling duplicates.

Video Transcript

We're wrapping up our ways to produce result sets module here, and in this one, instead of putting results side-by-side, which we've done so many times, both in the joins and the rows from and kind of in the subqueries too, I guess, we're gonna put results on top of each other. In this one, we're gonna look at union, intersect, and accept as ways to combine the results of two different queries over-under instead of side-by-side.

The most basic form of a union looks like this, select one, select two union, and that's gonna take the results from the top query and the results from the bottom query and put them over-under, but importantly, you must have the same number of columns, and, in fact, you must have these same data types as well. Normally, it doesn't look like this. Normally, it looks like select * from users where something, something, something, and then a union of select * from users, I didn't copy, select * from users where something something else. Normally, you're gonna run two queries, and you're gonna put 'em together over and under to produce one result set.

In fact, why don't we go ahead and do that, but not both from users, why don't we do it from users and users archive? Imagine this situation. Imagine a situation where your users are able to delete their accounts, but you shuffle them off into this holding table. You put 'em in this holding table for maybe 365 days or whatever. I don't know what GDPR is, don't come at me, but maybe instead of fully deleting them, you kind of soft delete them, but you also move them out of that table to keep your active dataset as slim as possible. Then somebody comes along and says, hey, it'd be really nice to search trashed users along with regular users, and you're like, you told me I could move them to a separate table! Don't worry about it, we have union, so let's show you how this might work.

In this example, we're gonna look for me across both tables, so we're gonna say where email equals [email protected], and in this scenario, we're gonna take whatever filters the user has provided from our fictional UI here, and we're gonna apply it to both queries and then union them together. If we run that, we see that I do exist, and have I been deleted, have I not? I don't know, it's hard to tell, but I exist, and that counts for something. We can tell if a user has been deleted by just selecting a constant up here and a constant down here, and we can just name this, in the first one, as active. We don't need to name it in the second one, and now we know that active is one.

Let's go ahead and make that a little more semantic. We can say active as true, so this user is active. Now, how does this look if we did a select * from user archive limit 1? So Mustafa Baumbach was deleted, and so we can pull their email address, and if we were to search for Mustafa, we would see that they still show up in the results set, but they are showing up as active false, which is pretty nice. We're able to search across all of the users, even if they are in different tables, which they could be. This is not a pattern that I use very often, but it is a common pattern of shuffling off old data into an archive table and then incurring the penalty of having to union on that whenever you're searching archive data, because you must presume that searching active data is the real hot path, and you wanna keep that table as small as possible, and so you're willing to incur the cost of unioning a users archive table here.

Let's keep going, and let's look at, we're gonna go back to just our super basic example. If we did select one union select one, and we did that, we only get one back, which is interesting. Union all is what you're looking for there. By default, union is going to deduplicate the results, which can be very expensive, so if you don't want that, make sure that you type union all, because in a real-world scenario, what's gonna happen is it has to compare the entirety of the row with every row in the result set below, and it has to do that every time. In our case, it's just comparing a few integers, but imagine a big old row, and it's trying to do some duplicate weed out strategy.

If you know, either by pure logic or just some business process in your brain, if you know that there cannot be duplicates or you want to see the duplicates, make sure that you put a union all, 'cause that's gonna be way, way faster.

There are two more that I wanna show you, and for these, we're gonna use generate series. We're so familiar with generate series at this point. We have generate series one through five, and we're gonna do generate series three through seven as well, and instead of union, we're gonna put intersect in here, and what this is going to do is this is going to give you the intersection of those two result sets.

Now, importantly, you can do intersect all, and that does not perform that duplicate weed out. I know, logically, that there are no duplicates here, and I don't really care, and so in here, I'm gonna put intersect all, because that is faster. You can just keep going. You can just say union all, and then you can throw another generate series from 10 to 15 down here. I don't love this, personally, because the order of operations can get a little hairy, so you can throw parentheses around it, and it makes it a lot more clear, and so it says generate series one through five intersect with this entire thing, which in this case, excludes 10 through 15 because the only intersection is three, four, and five. Or you could say, hey, why don't you generate this intersection first and then do the union all at the end?

In my opinion, if, for whatever reason, you're gonna string multiple of different types together, which I just don't understand, everything is possible, but if you're gonna do that, my recommendation would be throw some parentheses around it.

Let's go on to the last one. The last one is, instead of union, instead of intersect, it's the opposite, it's except. Same rules apply here, except all will prevent duplicates from being weeded out, which is to say it will allow duplicates in the result set, and there will not be any here. What this does is it takes the first one and says, gimme this entire result set except where it overlaps with this result set. In this case, one, two, three, four, five, three, four, five show up there, and so the result is one, two. Union take two result sets, put 'em one on top of the other, intersect, take two result sets, find their overlap, and except is take two result sets and give you the first one except where it overlaps with the second one.

All of them, you can append the all modifier, and that will prevent Postgres from doing those relatively expensive comparisons to weed out the duplicates. If you want the duplicates weeded out, do not shy away from just saying union, except, intersect. All I'm encouraging you to do is be thoughtful about, are there duplicates? Do you want them, or do you want Postgres to weed them out? But regardless, now you know a way to get some results sets, put 'em on top of each other.