Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
This is going to be a little bit of a thoughts with Aaron episode. The thoughts are on the topic of what data type should you use for your primary keys. And we're gonna split it up. We're gonna kinda split it up into two sections. One is integers and the other is UIDs or UIDs or something like that. My opinion is that in 98% of use cases, you should favor integer types. I'll explain why in a second.
When we're over on the integer types side, my opinion is you should use big integer. Now, this may sound contrary to what I've been just kind of preaching at you for so long, so far, which is choose the smallest data type that you can choose, the smallest data type that encompasses the entirety of your data. And you may be thinking, well, a big int boy that holds a lot, that holds a lot of integer. I don't need that many. I agree. Probably true. You do not wanna run outta space. Let's hope that whatever is a massive success and you start filling up the tables and your rows start growing and then suddenly at the peak of your success, you fall over because you've reached the bounds of the integer column. This actually happened to Basecamp. Don't feel too sorry for them. They make hundreds of millions of dollars, but it's still a problem, right? It's still a pain.
So over in the integer camp, I'm gonna say use a big integer even though we're making a trade off between, storage and future convenience. I'm willing to take that hit and incur a little bit more storage just for the peace of mind that my big inter or my IDs can just auto increment functionally forever if you run out of big ins. Honestly, I don't know what to tell you. Congratulations. I don't know that that's ever happened. I would set it as a big integer generated always as identity primary key. That is my recommendation for 98% of the use cases.
Now, when we, when we talk about why not big ends, people often bring up UIDs. Now UIDs are great, they're awesome, and they can be really, really valuable in certain scenarios, which I'll describe in a second. But important for us to know is that there are, I think at this time, seven variants of a uid. There are seven variants of a uid, and then there are like the cousins of the uid, which would be the U id ULID, and that is a Lexi graphically sortable uid. So that's the thing that this discussion hinges upon is which variant of UIDs are you using?
The one that's built into Postgres, which is like gin random uid, that is truly a random uid. Those UIDs when you are when you're inserting them, they could end up anywhere. It's not time ordered. With big in or with auto incrementing integer, it's always going to be increasing, which fits nicely with the structure of a bee Tree. It's always going to be increasing and it's going to remain relatively balanced because everything will be added to one side and then new intermediate nodes can be added. But when you have a random primary key, those things are gonna be inserted at random points down in the leaf nodes and you might have to break and rebalance that B tree over and over and over.
Now, this is not as much of a problem in Postgres as it is in something like MySQL where the table is actually arranged by primary key. So the penalty is not as great, but there is still a penalty for inserting a random key at a random point. So there are two drawbacks to using a uu. One is the size is larger, but using the Postgre Postgres UU would call them, you can get that down to 16 bytes. And so that, that is a lot more compact than it would otherwise be, but the size is a problem. Then the random insertion is the real problem.
So that can cause that B tree index to have to fracture and rebalance. You can get around that if you use U UID V seven, which it is a time ordered u uid. The first several bites of that UID are dedicated to time such that, when you add it, it's always going to come after the UIDs, before it. So functionally you're back to kind of like the auto incrementing style, but instead it's a uid. UIDs are the same way. They have the time portion at the beginning. The real benefit, and this is the reason you would use a U UID or a uli, the real benefit is you can generate these IDs without coordination and without talking to the database.
Let's say you're in a situation where you potentially have multiple clients and you need, let's say something like optimistic ui. You create an entity on the client and you need, you need that ID immediately, and then you send it back to the database. You're gonna have to generate your own id. And to do that, you need one of these things that you can generate without coordination, which would be a UID or a uli. That is a very legitimate use case and to, I totally respect that use case. What I would say is favor UID variant seven or favor uli. I think the final point when people talk about, when people talk about big S as a primary key, they say, wow, that's a security risk. That's a security risk because then you are exposed to an incrementing attack. I'm sensitive to that argument. I'm relatively unmoved by it because if I know that security by obscurity can be a layer of security, I hope that is not your only layer, but it can give away information. It can tell you how many invoices have been created or how many users you have.
If you're putting an incrementing primary key in the URL that does expose information. My recommendation would be have a have a public ID alongside your bigint primary key. In the situation where you do not want to expose your bigint primary key, my recommendation would be create a, a secondary key, create a key using a library like a nano id, very compact, very random, impossible to guess. Then you can use that in your API or your URL or wherever it's public facing. Then you can look up by that nano ID in your table, but you still get all of the benefits of the auto incrementing, big integer primary key.
Those are my recommendations on primary key data types. Prefer big int unless you have a good reason not to. If you are gonna use Uuid, use a time ordered variant, which is a V seven or a uli. If you're afraid of exposing integer, primary keys favor a secondary key, potentially something like a nano id, which can generate a nice compact readable, usable key for your URLs or your API.