The most comprehensive course on PostgreSQL

Learn how to use identity columns for primary keys in PostgreSQL, which are a better alternative to serial columns. We cover creating an identity column using `GENERATED ALWAYS AS IDENTITY`, ensuring IDs auto-increment safely without manual intervention. I recommend using `BIGINT` for these IDs and advise against using `OVERRIDING SYSTEM VALUE` to keep the sequence and table in sync.

We've finally arrived at identity columns. We had to do a bunch of stuff first, we had to talk about numeric columns, primarily integer and big integer, then we had to talk about serial, and then, sequences, and now, finally, we can talk about identity columns, and identity columns are my preference for primary keys, unfortunately, we're going to have to save the discussion about BIGINTs versus UUIDs, we're going to have to save that discussion until the indexing section because we need to learn a little bit about indexes before we talk about why I think BIGINTs are the best type of primary key.

With that being said, let me show you how to create an identity column as your primary key. First thing we're going to do is give it a name of "id," reasonable enough, declare it as a BIGINT so that we never run out of room, and then, here comes the fun stuff, "GENERATED ALWAYS AS INDENTITY." So this is the serial replacement, and then, we are going to tack on "PRIMARY KEY" there at the end. If we run this, then we can do, "insert into id_example," just "name," values of Aaron, that's a good name, and if we run that and read it back, "select * from id_example," what is interesting about that? Not a single thing. If we run a bunch more, you'll see it does auto-increment.

Now, because we said generated always, we cannot do this, we can't do this because it is... "GENERATED ALWAYS" can't insert non-default value, what we can do is two separate things. If, for whatever reason, this is the structure of your query and it is unchangeable, you can put the default keyword there and that is just fine. You could... Let's see what the last one is. The last one is currently 15. You could do this, and I don't recommend it, you can explicitly say, "I want to override the system," and to do that, you have to say "OVERRIDING..." Actually, let's see what the hint says, 'cause I actually don't remember, "OVERRIDING SYSTEM VALUE," there you go. It tells you what the detail is, IDs, identity defined as "GENERATED ALWAYS," which will change in a second, "HINT: Use OVERRIDING SYSTEM VALUE." So right here in the middle, we can say, "OVERRIDING SYSTEM VALUE," and then, pass through an explicit ID.

Now, if we run that, we see that our explicit ID was set to 16, but the actual underlying sequence was not incremented, and so, this is why I don't recommend doing it, 'cause now, the sequence and the table are out of sync, so if you're going to do this, you do have to go in and reset... Not reset, you have to actually probably reset the sequence to exactly what you're looking for, or potentially just pull a NEXTVAL off of it if you've only incremented it by one, but I don't think you can guarantee that, so you do have to go in and reset the sequence. This is not really a situation you want to find yourself in.

If you're going to say that the ID is auto-incremented, I would just let it do its thing, otherwise, you're going to have to do something like this, so we can do "pg_get_serial_sequence," so this is going to tell us the name of the sequence that is being used for this underlying column. It is still using sequences under the hood, it's just all wrapped up much more nicely and we don't really have to worry about the sequence, but this is the internal sequence that is being used, and so, now, we would have to say, "select setval," for that sequence, and then select... What do we even do here? "Select max(id) from id_example?" I just... This is so icky, so if we run that, then we have a problem, if we wrap that guy up, then we should be fine, we can go ahead and insert a new one, and that was kind of a pain. What we did there is we asked PostgreSQL, "Hey, you made a sequence when you made this identity column, can you give me the name of that sequence please? Because I didn't create it, it's internally created and maintained, can you please give me the name? Great, thank you for the name.

Now, I'm going to call setval on that sequence with the max ID from the table as it currently exists, and that will bring those two things back into sync." You can see why "OVERRIDING SYSTEM VALUE" is not awesome. It did tell you that there was another option, so let's just get rid of all of that, and let's do "drop table id_example," instead of "GENERATED ALWAYS," you can say, "GENERATED BY DEFAULT AS IDENTITY," and this is a little bit more loosey-goosey, so if we insert that, we should still get back the same kind of stuff, here's our "select," we still get back the same kind of stuff, but now, instead of having to jump through those hoops of saying, "OVERRIDING SYSTEM VALUE," we can just put in... We can just pass a value in, and it's not going to tell us that that is an error because what... How we declared the table was actually just, "Hey, by default, use the next value, otherwise, I'm going to provide a value and you have to be okay with it," so that is the by-default, and if we carry on, you'll see, we end up hitting that same exact error because we didn't manage the sequence itself, and so, we would have to do that again, and then, carry on, so my opinion is if you are going to use an identity column, just let it do its thing, if, for whatever reason, you are forced to provide a value.

I would recommend providing the default keyword, and, in fact, instead of saying, "GENERATED BY DEFAULT," I would say, "GENERATED ALWAYS," that gives you a little bit more security, somebody's going to have to try really hard to insert an ID, and when I say security, I don't mean, like, security, I mean a little bit of, like, comfort and a little bit of safety, because you know you're not accidentally going to type in an ID value, you're going to have to try really hard to put "OVERRIDING SYSTEM VALUE" in your query, and that makes me feel warm and fuzzy, so I like identity columns, these are the recommended way to create auto-incrementing IDs in PostgreSQL, these are much more portable, they're a little... They're wrapped up a little tighter, that sequence is kinda handled internally, so you don't really have to worry about it, and I would recommend using a BIGINT for those IDs, and we will talk about why we use BIGINTs versus UUIDs when we get to the indexing section.

Full Course

PostgreSQL database platform

Summary

Video Transcript