Mastering Postgres is such a crisp way to learn this subject. With today’s AI tools writing many of the queries via autocomplete it’s vital to have this knowledge to ensure you know what’s actually happening under the hood. My app’s performance improved overnight after learning more about indexing. Aaron’s excitement for the subject makes it really fun to learn whether I’m next to my code editor or on a treadmill run.Peter Ramsing
Shorten dev cycles with branching and zero-downtime schema migrations.
Postgres supports a lot more complex grouping with something called grouping sets, rollups and cubes. If you've ever worked with a database called Microsoft Excel, this is going to look very familiar because this is the type of stuff that you can do very easily in Excel and traditionally in other databases. It's not quite so simple, but again, thanks Postgres.
Let me show you how to do it. Here's the situation, select * from sales group by, let's group by employee ID again, and I've already messed this up because I'm so used to typing select *, so we'll just do select employee_id. Now we've done employee_id and some amount.
If we refresh our memories on the structure of this table, we do have a region in there as well. We have a store region, and let's say that we want to group by employee ID and see how much an employee has sold, and we wanna group by a region and see how much a region has sold, and we wanna group by everything and see what the total sales is, we can do that. Instead of just employee_id, we're gonna throw region up here as well. Instead of saying group by employee_id, we're gonna say group by and then we're gonna say grouping sets. We're gonna open a set of parentheses, I'm gonna make some space here.
Traditionally, you wouldn't see it like this, but I'm gonna make some space because what we put in here are all of the sets by which we want to group. Each set is denoted or set off with a set of parentheses. Say set one more time. Inside of each grouping set, we're gonna say what we want to group by. One thing we want to group by is employee_id and the other thing we want to group by is region. Add a semicolon to separate that statement. If we run that, we see, hey, it kinda works. But what is this format? Well, the format is when we're grouping by this first grouping set, which we declared as ID, it just leaves the region as null.
This is how we get around that, "Must be part of an aggregate expression "or must be grouped by." It just says, "I know what you're up to. "You asked me to do two separate things, "and so I'm gonna null out the one "that I'm not doing right now." Same thing goes down here, we're grouping by region and we're just nulling out employee because Postgres is saying, "Yeah, yeah, yeah, yeah, "I know, you're not grouping by that, "but the grouping set above is grouping by that, "and you kinda need it for the grouping set above." One final thing you can do with a grouping set is have an empty set, which gives you grouped on everything. This is a way to compute a bunch of different groups all at once and then handle it on your application side of figuring out which one is the total, which are the ones that are grouped by employee ID, and which are the ones that are grouped by region.
Traditionally, you would see it more written like this, but you can imagine if this is the first time you're seeing that, that's a lot of parentheses, that's a lot of stuff to figure out. You have another option. Grouping set is the most, we'll call it the most manual option. This is where you can write in any number of columns. In fact, let me uncomment this, you could group by more than one column here. A grouping set is not necessarily a column, it is a set of columns, we just happened to do single columns.
Now beyond grouping sets, we have another option. Beyond grouping sets, we have this idea of a rollup and a rollup is under the hood. It's just grouping sets, but it's a way to write grouping sets very quickly, such that smaller items roll up into bigger and then into biggest. Let me show you a rollup with this data. Instead of doing our grouping sets like that, we're gonna say rollup, and then we're going to say, let's rollup region and employee_id. Here we see something a little bit different. We see employee_id and region grouped together, and then we see just region by itself, and then we see the entire grouping set, which is just this empty set here at the end. This is equivalent, this rollup is equivalent to saying this. It's equivalent to saying grouping sets, and then we're gonna make some space here. It's equivalent to saying region, employee_id with parentheses around it, and then after that, just region, and then after that, just the empty set.
If we were to add, if we were to add another one out here of just asdf, we would have another one above it of asdf like that. It takes the entire argument list as the first one, and then it drops off the last argument here, and that makes the second grouping set, and then it just works its way down until it's just totally empty at the bottom. This is slightly different than the thing that we wrote before, which was "Group by employee ID, "and group by region, and group by everything." This is saying, "Group by region and employee ID, "then region, and then everything." The last one that we have is called cube. Grouping sets is the most manual where you explicitly declare the sets. Rollup, you declare a whole list of arguments and it works its way back from right to left until it reaches an empty state. With cube, you give it a list of columns and it creates every possible permutation for you.
Instead of using our rollup, and instead of using grouping sets, we're gonna use cube. The cube is going to be on employee_id and region. If we run that, we'll see every possible set here. We see region without employee ID, we see employee ID without region, we see employee ID and region, and we see the empty grouping set. Functionally, what has happened is we've got grouping sets, and then we've got a bunch of grouping sets. We've got employee ID and region, and then we've got employee ID and no region, and then we've got no employee ID and region. Then we've got the empty set. We have every possible grouping set here when we do a cube. Then you can imagine how quickly this explodes if we were to add another asdf in here. Then you have to have asdf and then region without asdf, and then employee ID and asdf without region. I'm not gonna do all that. But cube is a nice way to write really exhaustive grouping sets without having to manually write them.
We've got grouping sets, which is manual, rollup, which operates from right to left, rolling up, presumably, rolling up smaller units into larger units, nice for reporting. And then cube just creates every possible combination and creates grouping sets off of that. Very, very powerful for complex data aggregation.