Mastering Postgres has dramatically improved my understanding of indexes. It has helped me make informed decisions around my queries and schema rather than blindly adding indexes and hoping for the best.Dwight Watson
Shorten dev cycles with branching and zero-downtime schema migrations.
In this video, we're gonna talk about foreign key constraints. That constraint is a pretty important word because there are two things that exist, a foreign key constraint and a foreign key. And those are different things. You may read a blog post or see somebody tweet and say something like, "I don't use foreign keys." That's likely not true. Or if they say something like, "Well, that technology doesn't support foreign keys," likely not true.
What they mean, and we'll forgive them for this because it's kind of muddy, what they mean is foreign key constraints. The difference is, a foreign key constraint enforces a sort of referential integrity. Now, a foreign key is just kind of a concept, right? We have a pointer in a child table that points to the key or the ID in a parent table. That would be a foreign key. That just simply exists.
You don't have to have an index on either of them. You don't have to have a constraint on either of them. That simply exists as a thing. If you have a pointer to another table, that is the foreign key. In the section of this course where we talk about joins, I will talk about some indexing to make those joins a lot faster, and usually that is indexing foreign keys. That is philosophical. Now, foreign key constraints are a real thing, and that's where you say, "This foreign key must have a reference in the other table. If it doesn't, we need to stop everything," because I'm trying to enforce some integrity there.
Now that we understand the difference between those two things, we're gonna actually look at foreign key constraints in this video. Here we have two tables, super basic. I'm creating a states table with an ID as a bigint generated always as identity primary key, and just a name. Then we have a cities table with the same ID and the same name. Here we're gonna put our foreign key. We're going to say that state_id goes here. Now, this is my preferred naming pattern. My preferred naming pattern is, you have your table, and then I like to name all of my IDs, my primary keys, I like to name it simply ID. I'm not militant about this, that's just my preference. I think another very reasonable thing to do is to name it state_id and then name this city_id. That seems totally reasonable to me. I have no problem with that. I just don't like to do that. What I do like to do is then come down here, and for this foreign key, I like to prefix it with the singular of the table, so in this case, it's states, underscore, and then the name of the ID over here. What you could do and what some people do is they have state_id and then they'll name their primary key down here city_id and then they name this foreign key state_id as well. That's okay. Like I said, it's just not my preference. That doesn't really matter to me right now.
What I want to do is I want to show you how to create this foreign key constraint. The first thing you need to know is that these data types must match. We can't have this one be a text or we can't (laughing) have this one, for whatever reason, be a float. We have to have these data types match so that we can reference these things absolutely and not do any type juggling in the middle. We're gonna say state_id, and then you just put references, and you're gonna put states id on there.
We're gonna say that on the states table, we're gonna reference the ID. We can, oh, we haven't created states yet. We can create states and then create cities. Let's insert into states name, values, Texas. We're gonna insert that. We need to see what the ID is. I imagine it's one. ID of Texas is one. Number one, love that. Insert into cities. We'll insert a state-id, state_id and name and values of, let's start with something wrong, we're gonna say two and Dallas. We get exactly what we're looking for. This is the point of a foreign key constraint is that we get this enforcement. We get this enforcement that says, "Ah, can't do that because state_id equals two is not present in the table states." What we're doing here is we're telling Postgres, "Hey, put this in with a state_id of two in Dallas." It says, "Aha, I will try.
But I noticed that you told me state-id references states id. Before I do that, I'm gonna head over to the states table, I'm gonna look in there to ensure that the ID of two exists. If it does, we're good, hey, that's fine. But if it doesn't, you told me that it must exist, and so if it doesn't exist, I'm gonna throw an error." If we switch this over to one, then we see, "Hey, we're good. I went and I checked, and they knew you over there, they knew that state number one existed. I'm gonna go ahead and put this in now." We have a couple of other options here. The first thing that we're gonna look at is, we declared this as a column-level constraint. Totally fine, we can move that down to a table-level constraint and reference the exact same thing, or we could do a composite foreign key constraint. That would be the situation where you have two keys that make, or two columns that make up a unique key and you want to reference that in another table.
You can absolutely do that, but you do have to make that a table-level constraint instead of a column-level constraint. So to do that, I'm just gonna yoink this. We're gonna take this right off of the column, and then we're gonna come down here and say foreign key. And then you open up your set of parentheses, state_id, and then references that table slash column over there. I think I need to drop table cities. If we do that, we can create the table again. If we insert a good one, we're all good. If we insert a bad one, we are all bad. We're still in a good spot here. Now, when you have a composite foreign key, you can put both columns in here and you might say, "References states a, b," over on the states table. And you can have as many as you want here.
The important thing is that these must make up a unique constraint. It doesn't have to be primary key, but it does have to be a unique. That's the same case here. When it's just a single column or it's multiple columns, no matter how many, it must be unique, because you have to imagine, if we did something where it wasn't unique and we said, "The cities belong to a state," and then it got over to the states table and it was like, "Ooh, there are a lot of states over here that match this particular, let's say, state abbreviation or something." And it must be able to determine which one it belongs to. Any foreign key that you have in here, anything right here, must be under a unique constraint in that other table.
We do have a few options to control what happens when something in the parent table is either updated or deleted, and we can control that in the child table. The way that we do that is, after this references, we can say, on update, do something, or on delete, do something. By default, on delete is no action. What that does is, let's recreate this table, let's insert, and then let's delete from states. We see, "Nope, you can't do that." On delete, no action is basically a restrict. Then a restrict, on delete, restrict. We're gonna drop cities again, we'll create cities again, we'll insert into cities, and then we'll try to drop. It says, "You can't because you have restricted it."
Now, the difference between no action and restrict is very subtle. No action allows the check to be deferred to later in a transaction, whereas restrict does not allow that check to be deferred later in a transaction. But at the end of the day, the result is the same. You cannot delete a parent row without first deleting the child rows. But you can change that by saying cascade. Instead of restrict, you can say cascade. If we drop cities, create cities, and then let's read, we need to insert into cities. If we do select * from cities, we see we do have Dallas, but if we delete from states and then read back from cities, Dallas is now gone. Potentially useful and potentially terribly, terribly dangerous.
I feel uncomfortable about this because you can imagine a cascading cascade of cascades, right? Imagine that some team deletes their account and it deletes all of their projects and all of their users and all of their comments and all of their attachments, and you just cascade through multiple tables, expanding in breadth as you go. That makes me very nervous. I do like the referential integrity that a foreign key constraint gives me.
I don't love on delete cascade because you could end up with a problem, you could end up potentially deleting tens of millions of rows if some top-level parent decides, "I'm gonna delete my account." My preference is on delete, restrict, or just leave it as the default, which is on delete, no action, and that will prevent you from having that cascading effect. There is a sentinel and set default, which can be viable if your foreign key column is nullable. You can get away with that by just nulling it out. And then you just kinda leave orphaned rows behind, which then you can come through later and clean up. That's also viable, I'm fine with that as well.
But if you are gonna use cascade, just be really careful that you know what your top-level model is and what the potential fallout of deleting one row at the very top is. Could that explode into millions of rows down below? Maybe, potentially. Keep that in mind. Foreign key is just kind of a philosophical concept. Foreign key constraint allows you to enforce that referential integrity. Then when we get to joins, we'll talk about indexing the joins to make them go faster. Now you understand the constraint portion of foreign keys.