The most comprehensive course on PostgreSQL

Learn why the `SERIAL` data type in Postgres is just a shorthand for creating integer-based sequences and why it's discouraged in newer versions beyond Postgres 10. You'll see why `BIGSERIAL` is a better choice for primary keys to avoid space limitations, and understand that gaps in sequences are expected and safe. The video also covers how to use the `RETURNING` keyword to efficiently retrieve autogenerated values without running an additional query.

The SERIAL data type is one of those data types that, in fact, is not actually real. It is an alias that creates a real data type, the integer data type, but then, it also does two or three other things. Most often, you will see the SERIAL data type being used as a primary key. That was fine. It's still fine, but it was preferred, and that was the blessed method up until Postgres 10, and since Postgres 10, there is a better way to create an autoincrementing primary key.

We will look at that in the identity video, but I just want to warn you now that a SERIAL column since Postgres 10 is not the preferred or recommended way to create a primary key. There's a much simpler way and one that has fewer caveats as it relates to permissions.

That being said, SERIAL still exists, still legit, you can still use it, and under the hood, it uses something interesting called sequences, which we will also cover, so let's get into this, keeping in mind that SERIAL is not a great data type for primary keys. That being said, we are going to start here by declaring id as a SERIAL, and like I said, this is not real. This is not a real data type. What it actually is, is it's creating a sequence, which we'll cover in a future video, it's creating a sequence under the hood as an integer. Then it's taking this statement here, and this is what the real statement actually kind of expands out to: CREATE id integer because we declared this as a SERIAL, and the corresponding actual data type for a SERIAL is integer, so id integer.

It sets it as NOT NULL, and then, this is the autoincrementing part. This is a DEFAULT of nextval on this sequence, and the sequence exists outside of this column, and the sequence provides your autoincrementing, one after another. You can end up with gaps in a sequence, which is totally fine, and I'll explain why in a second. The last thing that happens is the sequence is altered, and it says it's owned by this table and this column such that when you drop the table or you drop the column, the sequence is also destroyed, and so this one little keyword right here expands out into all of this right here. That's what's happening under the hood. That's what it, when I say it's not real, I say that because integer is real.

Now, if, for whatever reason, you decide to keep using SERIAL as your primary key, that's fine. Honestly, it's fine. There's a better way, but come on. That's fine. I would say don't use SERIAL. Use BIGSERIAL instead, so BIGSERIAL, if we, yeah, let's go ahead and run that. BIGSERIAL is a legit data type, and its corresponding underlying type is, you guessed it, biginteger. This is the only place I'm going to say that you should be extremely generous with the room to grow in your data types.

At the beginning of this module, which feels like 1,000 years ago now, I said pick the smallest data type that matches your largest piece of data. Not true for primary keys. You do not want to run out of room. It is very bad. You will have a very bad day/week if you run out of primary key space. You may think, "All right, integer, 2.1 billion. "I'm never going to hit that." A lot of people have thought that. Basecamp thought that, and they ran out, they ran out of primary key space, so as much as I don't want to say plan for viral astronomic success, go ahead and just give yourself plenty of room to grow. Make it a biginteger. Make it a BIGSERIAL. Whatever you do, make sure that you don't run out of primary key space.

Let's talk about the way you could end up with gaps in your sequence, even if you never deleted any rows whatsoever. Now, gaps in a sequence, gaps in a primary key sequence, totally fine. Not a problem. If you have a business use case for strictly incrementing by one unit every time, this isn't going to work. You're going to have to roll your own solution because the way that you could end up with a gap in your sequence is when this nextval function is called, it increments the sequence, it gives you the next value and increments its internal pointer to say move on. It does not do this in a transaction-aware or transaction-safe way, which is a good thing because imagine this: Imagine you've got two transactions going, and in those transactions, you pull a nextval off the sequence of, we'll say 5. If the other transaction pulled the same nextval of 5 off of that sequence, when you get to the bottom and commit, you've got two 5s, and you're going to run into a duplicate key issue, and you're going to be very frustrated.

Imagine this: You pull 5 in one transaction, and then, the next transaction or a simultaneous transaction pulls the number 6, so you're in a good spot. You have different primary keys. You pulled different things off of the sequence. You're in a good spot. The transaction that pulled the value 5 either fails or just rolls back, and suddenly, you've committed the one that contains the value 6, so now you're at 1, 2, 3, 4, 6, and 5 is gone forever. 5 is gone forever. It's not put back in the. We've moved on. We have simply moved on. It's not put back in the sequence, and that's fine. That's a good thing. That keeps us safe, and that keeps us from ending up with transactions that can't commit because you both pulled the same number off of the sequence. That would be really frustrating.

We need to look at one more thing if you do decide to use this for primary keys, but then we're going to quickly move on and show you how we could use it for potentially, like, an order_number system or something like that. If you are going to use a SERIAL as a primary key, don't. If you're going to use a BIGSERIAL as a primary key, that's fine. You do need to declare it as a primary key. I am not here to tell you that if you have upgraded your Postgres from 9 all the way through, and all of your tables are using BIGSERIAL, that now you need to suddenly change everything. It's fine. You can continue to use BIGSERIAL. That's totally fine. Make sure you declare it as a primary key. That will prevent duplicates from going in there because while the sequence itself will not have duplicates, you can reset a sequence. I wouldn't do that, but you can reset a sequence, and you can also insert a value yourself. Instead of leaving it blank or manually pulling the nextval, you could insert a value yourself, and that might end up being a duplicate, so make sure that you mark it as a primary key.

Let's move on from this primary key nonsense and use a sequence as an order_number generator. Now, CREATE TABLE orders. Ooh, look at that. We'll talk about that later. Order_number is SERIAL, and then we have some other stuff in here, and so we can go ahead and create that table, and then, so you don't have to watch me type it, I'm going to paste this in, but importantly, we are, let's look at this again, so we have order_number there and id there, but we don't insert either of those things because both of those are going to have default values that are, in fact, autoincrementing, and so we can run that, and if we read this back, select * from orders, we see we have an order_number and an id. In this case, they're totally identical. You could change the start number of the sequence. We'll look at sequences in probably the next video, but at least a future video.

You could change the start number of the sequence and potentially your ids, they might be you uuids. They might not be autoincrementing integers, and so you might want something pretty to show your customers. That would be totally fine. There is a world in which you could just create a generated column to create an order_number, and I love generated columns, so anytime I can use a generated column, I'm going to do that. Now, one last thing that I want to show you is, and we'll look at this a little bit more in the querying section, but I want to show you this, if we INSERT INTO orders, we insert this order right here. It is kind of a bummer that we then have to turn around and do a select to get those autogenerated values back out, but, in fact, we don't have to do that, so using Postgres, we have this RETURNING keyword, and we can say RETURNING id or order_number, and that will give that back to you.

You can send it over from your application as one query and then get that autogenerated stuff back. In fact, you could get the entire row back, which is quite nice so that you don't have to make a round trip because how would you even do that? This may not be a unique set of data, and so using that RETURNING is a very nice way to deal with autogenerated values, either from primary keys or from sequences.

Let's look at sequences a little bit more in-depth in the next video.

Full Course

PostgreSQL database platform

Summary

Video Transcript