Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced SQL
Grouping sets, rollups, cubes

Full Course

$
349
$399
USD, one-time fee
Window function 😊. CTE with window function 😮. Hierarchical recursive CTE 🤯. Recommended all the way. Can’t wait to see the rest of the videos!
M Wildan Zulfikar
M Wildan Zulfikar

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

PostgreSQL offers powerful tools for advanced data grouping through grouping sets, rollups, and cubes. Grouping sets let you define multiple grouping combinations in a single query, while rollups provide hierarchical aggregations from detailed to summary levels. Cubes take it further by generating all possible group combinations, enabling flexible and comprehensive data analysis with minimal query complexity.

Video Transcript

Postgres supports a lot more complex grouping with something called grouping sets, rollups and cubes. If you've ever worked with a database called Microsoft Excel, this is going to look very familiar because this is the type of stuff that you can do very easily in Excel and traditionally in other databases. It's not quite so simple, but again, thanks Postgres.

Let me show you how to do it. Here's the situation, select * from sales group by, let's group by employee ID again, and I've already messed this up because I'm so used to typing select *, so we'll just do select employee_id. Now we've done employee_id and some amount.

If we refresh our memories on the structure of this table, we do have a region in there as well. We have a store region, and let's say that we want to group by employee ID and see how much an employee has sold, and we wanna group by a region and see how much a region has sold, and we wanna group by everything and see what the total sales is, we can do that. Instead of just employee_id, we're gonna throw region up here as well. Instead of saying group by employee_id, we're gonna say group by and then we're gonna say grouping sets. We're gonna open a set of parentheses, I'm gonna make some space here.

Traditionally, you wouldn't see it like this, but I'm gonna make some space because what we put in here are all of the sets by which we want to group. Each set is denoted or set off with a set of parentheses. Say set one more time. Inside of each grouping set, we're gonna say what we want to group by. One thing we want to group by is employee_id and the other thing we want to group by is region. Add a semicolon to separate that statement. If we run that, we see, hey, it kinda works. But what is this format? Well, the format is when we're grouping by this first grouping set, which we declared as ID, it just leaves the region as null.

This is how we get around that, "Must be part of an aggregate expression "or must be grouped by." It just says, "I know what you're up to. "You asked me to do two separate things, "and so I'm gonna null out the one "that I'm not doing right now." Same thing goes down here, we're grouping by region and we're just nulling out employee because Postgres is saying, "Yeah, yeah, yeah, yeah, "I know, you're not grouping by that, "but the grouping set above is grouping by that, "and you kinda need it for the grouping set above." One final thing you can do with a grouping set is have an empty set, which gives you grouped on everything. This is a way to compute a bunch of different groups all at once and then handle it on your application side of figuring out which one is the total, which are the ones that are grouped by employee ID, and which are the ones that are grouped by region.

Traditionally, you would see it more written like this, but you can imagine if this is the first time you're seeing that, that's a lot of parentheses, that's a lot of stuff to figure out. You have another option. Grouping set is the most, we'll call it the most manual option. This is where you can write in any number of columns. In fact, let me uncomment this, you could group by more than one column here. A grouping set is not necessarily a column, it is a set of columns, we just happened to do single columns.

Now beyond grouping sets, we have another option. Beyond grouping sets, we have this idea of a rollup and a rollup is under the hood. It's just grouping sets, but it's a way to write grouping sets very quickly, such that smaller items roll up into bigger and then into biggest. Let me show you a rollup with this data. Instead of doing our grouping sets like that, we're gonna say rollup, and then we're going to say, let's rollup region and employee_id. Here we see something a little bit different. We see employee_id and region grouped together, and then we see just region by itself, and then we see the entire grouping set, which is just this empty set here at the end. This is equivalent, this rollup is equivalent to saying this. It's equivalent to saying grouping sets, and then we're gonna make some space here. It's equivalent to saying region, employee_id with parentheses around it, and then after that, just region, and then after that, just the empty set.

If we were to add, if we were to add another one out here of just asdf, we would have another one above it of asdf like that. It takes the entire argument list as the first one, and then it drops off the last argument here, and that makes the second grouping set, and then it just works its way down until it's just totally empty at the bottom. This is slightly different than the thing that we wrote before, which was "Group by employee ID, "and group by region, and group by everything." This is saying, "Group by region and employee ID, "then region, and then everything." The last one that we have is called cube. Grouping sets is the most manual where you explicitly declare the sets. Rollup, you declare a whole list of arguments and it works its way back from right to left until it reaches an empty state. With cube, you give it a list of columns and it creates every possible permutation for you.

Instead of using our rollup, and instead of using grouping sets, we're gonna use cube. The cube is going to be on employee_id and region. If we run that, we'll see every possible set here. We see region without employee ID, we see employee ID without region, we see employee ID and region, and we see the empty grouping set. Functionally, what has happened is we've got grouping sets, and then we've got a bunch of grouping sets. We've got employee ID and region, and then we've got employee ID and no region, and then we've got no employee ID and region. Then we've got the empty set. We have every possible grouping set here when we do a cube. Then you can imagine how quickly this explodes if we were to add another asdf in here. Then you have to have asdf and then region without asdf, and then employee ID and asdf without region. I'm not gonna do all that. But cube is a nice way to write really exhaustive grouping sets without having to manually write them.

We've got grouping sets, which is manual, rollup, which operates from right to left, rolling up, presumably, rolling up smaller units into larger units, nice for reporting. And then cube just creates every possible combination and creates grouping sets off of that. Very, very powerful for complex data aggregation.