Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Indexing
Primary key types

Full Course

$
349
$399
USD, one-time fee
The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!
Nakolus
Nakolus

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn how to choose the right data type for primary keys, with big integers being the most efficient and scalable option in most cases. The video also explores the use of UIDs, recommending time-ordered variants like UUID V7 or ULID for coordination-free ID generation, ideal for scenarios like optimistic UI. For added security, it's advised to use a secondary key, such as a nano ID, alongside your primary key to prevent exposing sensitive information through incrementing numbers.

Video Transcript

This is going to be a little bit of a thoughts with Aaron episode. The thoughts are on the topic of what data type should you use for your primary keys. And we're gonna split it up. We're gonna kinda split it up into two sections. One is integers and the other is UIDs or UIDs or something like that. My opinion is that in 98% of use cases, you should favor integer types. I'll explain why in a second.

When we're over on the integer types side, my opinion is you should use big integer. Now, this may sound contrary to what I've been just kind of preaching at you for so long, so far, which is choose the smallest data type that you can choose, the smallest data type that encompasses the entirety of your data. And you may be thinking, well, a big int boy that holds a lot, that holds a lot of integer. I don't need that many. I agree. Probably true. You do not wanna run outta space. Let's hope that whatever is a massive success and you start filling up the tables and your rows start growing and then suddenly at the peak of your success, you fall over because you've reached the bounds of the integer column. This actually happened to Basecamp. Don't feel too sorry for them. They make hundreds of millions of dollars, but it's still a problem, right? It's still a pain.

So over in the integer camp, I'm gonna say use a big integer even though we're making a trade off between, storage and future convenience. I'm willing to take that hit and incur a little bit more storage just for the peace of mind that my big inter or my IDs can just auto increment functionally forever if you run out of big ins. Honestly, I don't know what to tell you. Congratulations. I don't know that that's ever happened. I would set it as a big integer generated always as identity primary key. That is my recommendation for 98% of the use cases.

Now, when we, when we talk about why not big ends, people often bring up UIDs. Now UIDs are great, they're awesome, and they can be really, really valuable in certain scenarios, which I'll describe in a second. But important for us to know is that there are, I think at this time, seven variants of a uid. There are seven variants of a uid, and then there are like the cousins of the uid, which would be the U id ULID, and that is a Lexi graphically sortable uid. So that's the thing that this discussion hinges upon is which variant of UIDs are you using?

The one that's built into Postgres, which is like gin random uid, that is truly a random uid. Those UIDs when you are when you're inserting them, they could end up anywhere. It's not time ordered. With big in or with auto incrementing integer, it's always going to be increasing, which fits nicely with the structure of a bee Tree. It's always going to be increasing and it's going to remain relatively balanced because everything will be added to one side and then new intermediate nodes can be added. But when you have a random primary key, those things are gonna be inserted at random points down in the leaf nodes and you might have to break and rebalance that B tree over and over and over.

Now, this is not as much of a problem in Postgres as it is in something like MySQL where the table is actually arranged by primary key. So the penalty is not as great, but there is still a penalty for inserting a random key at a random point. So there are two drawbacks to using a uu. One is the size is larger, but using the Postgre Postgres UU would call them, you can get that down to 16 bytes. And so that, that is a lot more compact than it would otherwise be, but the size is a problem. Then the random insertion is the real problem.

So that can cause that B tree index to have to fracture and rebalance. You can get around that if you use U UID V seven, which it is a time ordered u uid. The first several bites of that UID are dedicated to time such that, when you add it, it's always going to come after the UIDs, before it. So functionally you're back to kind of like the auto incrementing style, but instead it's a uid. UIDs are the same way. They have the time portion at the beginning. The real benefit, and this is the reason you would use a U UID or a uli, the real benefit is you can generate these IDs without coordination and without talking to the database.

Let's say you're in a situation where you potentially have multiple clients and you need, let's say something like optimistic ui. You create an entity on the client and you need, you need that ID immediately, and then you send it back to the database. You're gonna have to generate your own id. And to do that, you need one of these things that you can generate without coordination, which would be a UID or a uli. That is a very legitimate use case and to, I totally respect that use case. What I would say is favor UID variant seven or favor uli. I think the final point when people talk about, when people talk about big S as a primary key, they say, wow, that's a security risk. That's a security risk because then you are exposed to an incrementing attack. I'm sensitive to that argument. I'm relatively unmoved by it because if I know that security by obscurity can be a layer of security, I hope that is not your only layer, but it can give away information. It can tell you how many invoices have been created or how many users you have.

If you're putting an incrementing primary key in the URL that does expose information. My recommendation would be have a have a public ID alongside your bigint primary key. In the situation where you do not want to expose your bigint primary key, my recommendation would be create a, a secondary key, create a key using a library like a nano id, very compact, very random, impossible to guess. Then you can use that in your API or your URL or wherever it's public facing. Then you can look up by that nano ID in your table, but you still get all of the benefits of the auto incrementing, big integer primary key.

Those are my recommendations on primary key data types. Prefer big int unless you have a good reason not to. If you are gonna use Uuid, use a time ordered variant, which is a V seven or a uli. If you're afraid of exposing integer, primary keys favor a secondary key, potentially something like a nano id, which can generate a nice compact readable, usable key for your URLs or your API.