Aaron is a natural teacher and this course is the best introduction to Postgres I have come across. Lessons are easy to follow and he recommends some great tools for working with Postgres.Joel Drumgoole
Shorten dev cycles with branching and zero-downtime schema migrations.
This is my favorite part of courses like this because all the atomic pieces start to come together and we really start to make a lot of progress.
Here we're going to look at generating a series and left-joining to cover gaps in sequences. This one is going to be fun. I've made up a sales table here. We have select * from sales
. Sales dates go from January 1st to January 31st. And then there's just an amount and a user ID. But, if we say select sale_date and sum of amount from sales group by, which we'll talk about group by in a little bit. Group by sale_date order by sale_date ascending. You'll see this does add it all up. But, look, right there. 01-03, 01-05, unlucky. There were no sales on the 4th of January for whatever reason. But our report is not necessarily showing that, it just looks like missing data. When you have a report like this and you have this nice, neat, orderly sequence and then a piece of missing data, it really like, (groans) makes me wonder, "Is the whole thing right?"
What we're going to do here is we're going to generate a series and join this data in, such that we get 01-04 on this report, even if it does show a zero. I think the most fun way for us to do this is just to build it up together. I have this here, generate_series 01-01 to 01-31 by day. We've seen this before. We can run that, and if we write the right words, we can run that and get back exactly what we're looking for here. We are going to use this as the basis to join on. This is going to be our left-hand table. It makes some space there. We're going to say, as all_dates, and then, left join. Then we're going to open a subquery. We're just doing it all here. We're going to left join, because no matter what, we want the date to show up. We need the date to show up, that is the point, even if there are no matches in the right-hand table.
The right-hand table is going to become this guy. We do not need this ordering inside, we can move that outside later. We're going to select sale_date and sum(amount), and we'll select that as total_amount from sales group by sale_date as, let's just say, well, we can just alias it back to sales. That's totally legal. We're going to just call this sales on sales.sale_date =, what did we call it? all_dates. We didn't give that one a name, so we're going to say, all_dates(sale_date). We're going to give it a table alias and a column alias. all_dates.sale_date. We have our semicolons. We can get rid of this guy. And now, if we run that, you'll see, kind of, I mean, yeah, that actually, (chuckles) that actually worked first try. This is, in fact, my first take on this, and that worked.
We have the big gnarly sale date over here, which looks like a timestamp time, er, timestamptz. And then we have the sale_date and the total_amount. And that is kind of, that's kind of what we're looking for. Close. Let's make it a little bit prettier. What we can do here, is up here, we can say select, let's do all_dates.sale_date, cast it to a date. How does that look? Okay, we got all dates, that's good.
I forgot what we're actually doing here. We have the sale date and we need the, yep, now we need the total_amount. And we're getting closer. That feels a little bit better. There's one function that we can use here, which is called coalesce. And that's going to put that zero in there. Without the coalesce, let's just run them side by side. Without the coalesce, we have a null. We're intuiting here, and now I'm going to tell you, that the coalesce function simply chooses the first non-null argument. In fact, you could write, just for fun, I don't know what you do for fun, but maybe this is it, and it will still work because it's going to say, "No, no, no, no. There we go, there's one." And in the case of January 4th, it's going to say, "No, there's a zero." So remember, zero and null are not the same thing. Null is unknowable, zero is, well, zero. Coalesce will happily skip over total_amount being null and choose zero. A real-life, honest-to-goodness example of a left join in a generate_series on dates.
Let's look at it one more time and then we'll call it. What we've done here, this is silly. I don't know what you do for fun. That was very silly. All right, so here we go. If we run that, what we've done here is, we're going to skip over the select for now. We took a generate_series and gave it a table alias and column alias. We used that as our left-hand table in a left join, which we know means, "Please preserve everything from the left-hand table no matter what." And we're guaranteed that this is going to be complete because we're stepping from 01 to 31 by one-day increments. That gives us our nice, clean list of dates.
Then we're saying, "Let's left-join something in, but instead of left-joining a table, let's left-join this in," and this is a subquery. Inside this subquery, we selected the sale_date, we grouped by that sale_date, and we aggregated, we summed up the amounts and gave that an alias from the sales table. In fact, we turned around and aliased it right back to the same name, which can be confusing, but is totally legal. Then we said, all right, sales.sale_date, which is this guy here, equals all dates.sale_date, which is that guy there. It preserved everything from the left-hand table, matched up the sale_dates, which brings along the total_amount.
We had those three columns that we had to deal with. We cast the sales_date, or the all_dates.sale_date to a date instead of a timestamptz. Then we coalesced the total amount because on the days where there were no sales, it was showing up as null, and we wanted it to show up as zero, and coalesce will pick the first non-null argument.
In many other databases, you'd be stuck using a recursive CTE to do this because we don't have a generate_series in many other databases. But thank you, Postgres. We have generate_series, and that can help fill in gaps in sequences.