Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced Data Types
Foreign key constraints

Full Course

$
349
$399
USD, one-time fee
Aaron is a natural teacher and this course is the best introduction to Postgres I have come across. Lessons are easy to follow and he recommends some great tools for working with Postgres.
Joel Drumgoole
Joel Drumgoole

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Foreign keys represent relationships between tables, while foreign key constraints enforce those relationships to maintain data integrity. You'll learn how constraints ensure that a value in a child table must exist in the parent table, preventing issues when inserting or deleting data. The video also covers the importance of matching data types and how to use actions like `RESTRICT` and `CASCADE` carefully when managing updates and deletions.

Video Transcript

In this video, we're gonna talk about foreign key constraints. That constraint is a pretty important word because there are two things that exist, a foreign key constraint and a foreign key. And those are different things. You may read a blog post or see somebody tweet and say something like, "I don't use foreign keys." That's likely not true. Or if they say something like, "Well, that technology doesn't support foreign keys," likely not true.

What they mean, and we'll forgive them for this because it's kind of muddy, what they mean is foreign key constraints. The difference is, a foreign key constraint enforces a sort of referential integrity. Now, a foreign key is just kind of a concept, right? We have a pointer in a child table that points to the key or the ID in a parent table. That would be a foreign key. That just simply exists.

You don't have to have an index on either of them. You don't have to have a constraint on either of them. That simply exists as a thing. If you have a pointer to another table, that is the foreign key. In the section of this course where we talk about joins, I will talk about some indexing to make those joins a lot faster, and usually that is indexing foreign keys. That is philosophical. Now, foreign key constraints are a real thing, and that's where you say, "This foreign key must have a reference in the other table. If it doesn't, we need to stop everything," because I'm trying to enforce some integrity there.

Now that we understand the difference between those two things, we're gonna actually look at foreign key constraints in this video. Here we have two tables, super basic. I'm creating a states table with an ID as a bigint generated always as identity primary key, and just a name. Then we have a cities table with the same ID and the same name. Here we're gonna put our foreign key. We're going to say that state_id goes here. Now, this is my preferred naming pattern. My preferred naming pattern is, you have your table, and then I like to name all of my IDs, my primary keys, I like to name it simply ID. I'm not militant about this, that's just my preference. I think another very reasonable thing to do is to name it state_id and then name this city_id. That seems totally reasonable to me. I have no problem with that. I just don't like to do that. What I do like to do is then come down here, and for this foreign key, I like to prefix it with the singular of the table, so in this case, it's states, underscore, and then the name of the ID over here. What you could do and what some people do is they have state_id and then they'll name their primary key down here city_id and then they name this foreign key state_id as well. That's okay. Like I said, it's just not my preference. That doesn't really matter to me right now.

What I want to do is I want to show you how to create this foreign key constraint. The first thing you need to know is that these data types must match. We can't have this one be a text or we can't (laughing) have this one, for whatever reason, be a float. We have to have these data types match so that we can reference these things absolutely and not do any type juggling in the middle. We're gonna say state_id, and then you just put references, and you're gonna put states id on there.

We're gonna say that on the states table, we're gonna reference the ID. We can, oh, we haven't created states yet. We can create states and then create cities. Let's insert into states name, values, Texas. We're gonna insert that. We need to see what the ID is. I imagine it's one. ID of Texas is one. Number one, love that. Insert into cities. We'll insert a state-id, state_id and name and values of, let's start with something wrong, we're gonna say two and Dallas. We get exactly what we're looking for. This is the point of a foreign key constraint is that we get this enforcement. We get this enforcement that says, "Ah, can't do that because state_id equals two is not present in the table states." What we're doing here is we're telling Postgres, "Hey, put this in with a state_id of two in Dallas." It says, "Aha, I will try.

But I noticed that you told me state-id references states id. Before I do that, I'm gonna head over to the states table, I'm gonna look in there to ensure that the ID of two exists. If it does, we're good, hey, that's fine. But if it doesn't, you told me that it must exist, and so if it doesn't exist, I'm gonna throw an error." If we switch this over to one, then we see, "Hey, we're good. I went and I checked, and they knew you over there, they knew that state number one existed. I'm gonna go ahead and put this in now." We have a couple of other options here. The first thing that we're gonna look at is, we declared this as a column-level constraint. Totally fine, we can move that down to a table-level constraint and reference the exact same thing, or we could do a composite foreign key constraint. That would be the situation where you have two keys that make, or two columns that make up a unique key and you want to reference that in another table.

You can absolutely do that, but you do have to make that a table-level constraint instead of a column-level constraint. So to do that, I'm just gonna yoink this. We're gonna take this right off of the column, and then we're gonna come down here and say foreign key. And then you open up your set of parentheses, state_id, and then references that table slash column over there. I think I need to drop table cities. If we do that, we can create the table again. If we insert a good one, we're all good. If we insert a bad one, we are all bad. We're still in a good spot here. Now, when you have a composite foreign key, you can put both columns in here and you might say, "References states a, b," over on the states table. And you can have as many as you want here.

The important thing is that these must make up a unique constraint. It doesn't have to be primary key, but it does have to be a unique. That's the same case here. When it's just a single column or it's multiple columns, no matter how many, it must be unique, because you have to imagine, if we did something where it wasn't unique and we said, "The cities belong to a state," and then it got over to the states table and it was like, "Ooh, there are a lot of states over here that match this particular, let's say, state abbreviation or something." And it must be able to determine which one it belongs to. Any foreign key that you have in here, anything right here, must be under a unique constraint in that other table.

We do have a few options to control what happens when something in the parent table is either updated or deleted, and we can control that in the child table. The way that we do that is, after this references, we can say, on update, do something, or on delete, do something. By default, on delete is no action. What that does is, let's recreate this table, let's insert, and then let's delete from states. We see, "Nope, you can't do that." On delete, no action is basically a restrict. Then a restrict, on delete, restrict. We're gonna drop cities again, we'll create cities again, we'll insert into cities, and then we'll try to drop. It says, "You can't because you have restricted it."

Now, the difference between no action and restrict is very subtle. No action allows the check to be deferred to later in a transaction, whereas restrict does not allow that check to be deferred later in a transaction. But at the end of the day, the result is the same. You cannot delete a parent row without first deleting the child rows. But you can change that by saying cascade. Instead of restrict, you can say cascade. If we drop cities, create cities, and then let's read, we need to insert into cities. If we do select * from cities, we see we do have Dallas, but if we delete from states and then read back from cities, Dallas is now gone. Potentially useful and potentially terribly, terribly dangerous.

I feel uncomfortable about this because you can imagine a cascading cascade of cascades, right? Imagine that some team deletes their account and it deletes all of their projects and all of their users and all of their comments and all of their attachments, and you just cascade through multiple tables, expanding in breadth as you go. That makes me very nervous. I do like the referential integrity that a foreign key constraint gives me.

I don't love on delete cascade because you could end up with a problem, you could end up potentially deleting tens of millions of rows if some top-level parent decides, "I'm gonna delete my account." My preference is on delete, restrict, or just leave it as the default, which is on delete, no action, and that will prevent you from having that cascading effect. There is a sentinel and set default, which can be viable if your foreign key column is nullable. You can get away with that by just nulling it out. And then you just kinda leave orphaned rows behind, which then you can come through later and clean up. That's also viable, I'm fine with that as well.

But if you are gonna use cascade, just be really careful that you know what your top-level model is and what the potential fallout of deleting one row at the very top is. Could that explode into millions of rows down below? Maybe, potentially. Keep that in mind. Foreign key is just kind of a philosophical concept. Foreign key constraint allows you to enforce that referential integrity. Then when we get to joins, we'll talk about indexing the joins to make them go faster. Now you understand the constraint portion of foreign keys.