Mastering Postgres

In the video, I learn how to use row value syntax in databases, which helps compare multiple values at once, essential for tasks like pagination using cursors. The video explains how to implement cursor-based pagination by having the client return a "cursor" or "key" of the last seen record to continue retrieving data efficiently and deterministically. Additionally, I discover how to handle complex data comparisons in badly structured schemas using row value syntax, simplifying operations despite challenges in data format.

We're gonna look at something here called row value syntax, which can be very helpful when you're comparing more than one value to more than one value. When you're comparing one-to-one value, it's a scaler kind of comparison. If you need to compare three values to three values, you can use this row value syntax. It comes in super-duper handy when you're doing something like pagination using cursors. We are going to do that.

Let me show you row value syntax here. This is row value syntax. If we select that, you see it actually gives it the name row. In fact, row is a keyword you can use. It is optional. You can compare row values to row values. Now, watch this. We talked about nulls, right? We've talked about nulls so much. If we do that, we see, hmm, shoot, that's null. Okay, that's fine. If we did one, two, four, we see that that row value is not equal to that row value. That's fine. But what if we did this? What do you, the viewer, the dear viewer, what do you think is going to happen here? False or null? False. And why is that? Because Postgres looks at this and says, one, one, yeah. Two, null. Shoot, I have no idea. Three, four. Definitely not.

There is no value that you could substitute for null here to get these two things to equal because these two discreet values are not equal to each other. Postgres says, no matter what corner of what universe you're in, this will never equal each other. It's kind of cool. It says, no, in fact, that is false. Now, why is this useful? Well, let's do this. Select * from users limit 10. We are gonna take... We gotta put an order ID on here. Select * from users. In fact, let's do it this way. Let's do order by, we'll do first name, last name, and then ID.

If we do this, we get back all of the people named, there's a lot of them, named Aaliyah right up front here. The reason that we're adding ID on to the end here is, in fact, for this very reason right here. Because this library that generated this data only has a certain amount of first and last names. We end up with people with... Those aren't the same. We end up with people with the same name. If you're going to do some sort of pagination, which is what we're about to do here, your ordering must be deterministic. It must be the exact same thing every time.

In Postgres, when you're ordering by one or two, or many columns, and the ordering isn't deterministic, it is totally up to Postgres to decide what order to bring these things back in. If we limit this down to... We don't need to limit it. If we just look here at Aaliyah Balistreri. Those two rows came back in this order and they might come back in that order most of the time, but they do not have to and you cannot rely on it. To make it deterministic, I always add something that is unique, and in most cases, that is the ID. I suppose it could also be the email. Here, we're gonna do ID because that is guaranteed to be deterministic in all tables. We have this result set.

The question is, how do we find the next 10 rows? The thing that must be held true is, we have to order by the same thing. We're gonna order by these three rows no matter what. The way that this works, this is called cursor-based pagination. It can be a little bit confusing because there is a database concept of cursors where you can page efficiently through a result set. That is not this. This is a pattern or a theoretical. This is, a pattern is probably the best word. This is a pattern for paginating or paginating rows. It is the cursor-based pagination. You might also hear it called key-based or keyset-based pagination. Those are all the same thing. And they stand opposed to offset limit pagination, which can be problematic the further into the result set you get.

We're gonna implement cursor-based pagination here using row value syntax to make this comparison a whole lot easier. Fundamentally, what we need to do here is, we need the client, whoever has consumed this page one of the results. We need the client to give us back some sort of key or some sort of cursor, or a marker, or a point in the record set and say, "Here's the last record that I saw." In this case, the last record that they saw was... Wrong one. The last record that they saw was Aaliyah Bashirian with an ID of 322714.

What we need back to form our key or our cursor are the columns that we ordered by. I'm gonna bump this to 11 so we can see where page two should start, so that we know if we've done this right, which hopefully we will do it right. Okay, so that is page two. That's the first record on page two. When the client or the consumer, or the application, hands the database back and says, "Here's the last record that I saw." This is how we construct page two. What we're gonna do is we're gonna say, select * from users.

We've gotta keep the order by, but we're gonna drop in our known key here. We're gonna say where? Row value, first name, last name, ID is greater than the last record that we saw, which is, again, Aaliyah Bashirian with an ID of 322714. Fingers crossed, we're gonna see Aaliyah Bauch of 904155 come up. 904155. Aaliyah Bauch. What we've done here is we've used this row value syntax to compare all three of those columns rather than, because you have to imagine, what does this expand out into? Well, it expands out into a lot because, basically, we need to find all of the Aaliyah Bashirians with an ID greater than that, or all of the Aaliyahs... Goodness. I don't even know if I can do this. All of the Aaliyahs that come after Bashirian or all of the people that come after all of the first things that come after Aaliyah.

You can see how this explodes in complexity if we wanted to do the discreet column comparisons, but because we have row value syntax, we can say, "Hey, just take these three, compare it to these three." You figure out that whole or structure of show me the Aaliyah Bashirians after this ID or the Aaliyahs after this last name, or the people after the name Aaliyah. That's crazy. I'm not doing that. This is a great way to implement cursor or keyset-based pagination.

I want to show you one more example with a schema that I really, really hate. You know what? Sometimes you're not in charge of everything. As much as I want to be in charge of everything always, ever, I'm not always in charge. Sometimes I have to deal with the schema that is given. In this case, I'm dealing with a schema where the year, the month, and the day are all stored in separate columns. Don't do that. I hate it.

I'm gonna show you how to use row value syntax to do some nice comparisons on that even though it's an awful schema. Now I don't have a schema like that because I wouldn't dare, but let's generate one together. We're gonna use generate series to go from 2025-01-01 to 2025-12-31. That gets us, oh, what I would prefer to see, which is a legit date column here. Instead, what we're gonna do is we're gonna do extract. Extract. We're gonna do from, and then we'll call this as generate series date. So, from gs.date. I'm gonna do three of those and come back up here, and say year, and month, and day. Now with that comma gone.

Now if we run that, oh, that is ugly, isn't it? As year and as month, and as day. And so, boy, unlucky, not a great schema. What if we need to find, based on this horrible schema, we need to find certain rows where the dates are between something. That is fine if you're operating in the same month because you can say where month equals one and day is between one and seven. Once you get to month equals one and day is above 29, or month equals two and day is below seven, or month equals, you know, month equals two all the way through 28. It just gets horrible and I don't want to do it.

We can do it this way. Let's do with date parts all coming together, with date parts as, and we're gonna open that up, and we'll come down here and close that, and see if we're still good. I like to build up as we go. Select * from date parts. We're still good. And now we're gonna use row value syntax to figure this out. So, where. Let's find up there. Where, year, month, day is greater than.

Actually, let's do is between, because that's kind of the thing that we were talking about is between 2025 and the month is 01. We'll say the end of the month. We'll take you and come over here, and drop off the and, and we'll bump this to two and we'll say... Actually, let's bump it to three because that's quite hard to do with discrete columns. And three. We can look at this and say, "Oh, thank goodness we have row value syntax," because we're looking for 1-20 to 3-3. We got 1-20 all the way through the month of two down to 3-3.

Even though those columns are discreet parts, we're kind of treating them as one for the purposes of this comparison. This will expand into a bunch of different comparisons, but as the human, we're able to easily understand what this is. While I don't recommend this schema for storing dates, please don't do that. Don't do that. Go back to the date videos. Don't do that.

While I don't recommend that schema, this is incredibly useful and it'll simplify your life greatly when you have data stored across multiple columns, and you need to compare it to another set of data, you can use row value syntax.

Full Course

PostgreSQL database platform

Summary

Video Transcript