Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
In this video, I'm gonna show you how to do upsert, and an upsert is a nice combination of an insert and an update that guarantees one atomic outcome. Uou can insert something, but if it violates a uniqueness constraint, you can turn that insert into an update without having to issue a second query.
For this example, we're gonna create a table called kv, which is just going to be a key value store and we'll have key as text primary key and value as just simply text. We'll insert into kv, key and value with values of... Let's just say that we're doing... This is a cache table and we're caching foo and 123. We've inserted that value select * from kv and we've got our little key value store going on here. Kinda nice, probably pretty useful.
If we try to insert this again, we get the duplicate key violates unique constraint. That's what we want. The key must be unique that was declared when we declared it as a primary key. All primary keys must be unique. That index was created and that uniqueness is enforced on that column. But instead of just getting this error, we can do a few different things. We can ignore the error and say, "That's fine, I don't really care." Or we can do an update and insert our new value. We get to send one query over the wall from the application to the database and say, "Hey, if this key is not there, insert the value. If the key is there, maybe update the value." That is an upsert.
What we're gonna do here is we're gonna make some space and then we are going to open up this right here on conflict key. This is what puts us into upsert mode. We're saying, "Hey, insert this value on on a conflict on the key column." On this unique constraint column." If there is a conflict, here's what you should do. In fact, you should do nothing. You should just do nothing. I super don't care. What this does is this makes the error go away, which is quite nice. If your application is issuing this query, it's not gonna come back and say, "Hey, you messed up." It's gonna say, "Ah, I didn't do anything." And this might be exactly the business logic that you require. This is very common. What this is is throwing one query over the wall and saying, "If it's not there, insert it. If it is there, I don't care, don't tell me about it. I so don't care one bit." This is much better than issuing a select, getting the information back, and then deciding to do the insert. Because by the time you do the insert, that key may already be there, right? We're eliminating this timing error and saying, "Just do it all in one go. Put it there, if it's not there. If it's not, just keep moving on."
This is likely not what you want or it may likely be exactly what you want. But in case it's not, you can do this, you can do do update set value equal to something. And what we're just gonna start by proving that it works. We'll say, instead of hard coded, let's say, "There was a conflict." And so if we run that, we'll see there was a conflict. We have accomplished the upsert. We tried to insert 123 and we saw that there was a conflict and we set to a value there was conflict. That's not really a real world use case, right? The real world use case is likely something more like this. You have access to this magic excluded keyword or object and you can say excluded.value. If we run that and read that back, you'll see it's set to 123. And if we set this to 456 and tried to run it, it would be set to 456. This excluded holds the values that we're trying to be inserted. If I don't think this is a good idea. But you could set it to excluded.key and you would see that it is set to excluded.key. Doesn't make a lot of sense in this use case, but that is to prove that you do have access to the values that are incoming.
You also have access to... This is just gonna be kv... That's just gonna basically be a do nothing. You could set it to kv.key, which is again, not that useful. But just to prove that you do have access to that, both the original table and the excluded keyword. Now there may be situations where you want to do an upsert sometimes. You wanna do an upsert sometimes and the other times you just kind of want to ignore it, you can do that by adding a constraint on to the upsert clause. If we read this back, we still have 456 as the value and I'm just going to update the entire table 'cause there's only one row in there, update kv set value equal to null. If we come up here and we just run our traditional insert, we can't insert a new value even when the value is null because that key is a unique constraint.
What we can do is we can say, on conflict. We can do the same old thing, set it to excluded value where kv.value is null. Here, this where is applying to this on conflict clause. We can run that and if we read it back, you'll see that we do get 456. If we run this again and say 890, we're not going to update it to 890 because it is not... Kv.value is not null. In this case, it's kinda nice 'cause it works like a one-way door, right? If the value is null, let's go ahead and write the value in. If the value is not null, we're going to ignore everything.
You can imagine where this would be useful especially in maybe a cache situation where you have an expires at timestamp and you wanna say, "Hey, if this value is expired, let's go ahead and override it. Otherwise, we're not gonna overwrite it until it expires." Or potentially, you have some sort of timestamp on the record and a timestamp incoming of the last edit from the client, let's say, and you wanna say, "Hey, if my version is newer than the version that's in the database, go ahead and override it. But if it's not, we need to leave the newer version in the database." You can do all kinds of comparisons in that constraint.
I wanna show you one more thing, which is how you can increment or decrement values that are in this kv table. I'm gonna change the structure of this table just a little bit. I already dropped it. We're gonna come in here and say, create kv key and we'll say text primary key and we'll say... This time, I wanna make the value as in an int4. That's probably big enough. We can say, insert into kv the key, value, values. We'll say hits... We'll just say, hits:homepage and we'll insert a value of one. Okay, now reading this back. Let's start from kv, we have one and if we want to increment, we can't. Because we're very, very smart, we can by doing an on conflict. On conflict key do update set and we're gonna set value equal. What we could do is we could just do this and that would probably work except there are two values. There's kv.value and excluded.value so we have to be specific. If we do that, that does work, but that's not as flexible or rather as perfect as I would like.
What I would like to do is in fact use that second value, the excluded.value. If we run this, we see that it was incremented by one. But let's say for whatever reason, maybe we have a cache on the application side that we flush to the database every second or whatever. This time, there were 10 hits within the last second and now we run that and we've incremented it by 10. This is a little bit more flexible. Instead of just saying plus one, it says, plus whatever was coming in from the application itself.
Upsert's super cool. It prevents you from having to go over the wall, do a select, come back to the application, go back over the wall, potentially do an update or an insert depending. It allows you to wrap it all up in good logic, send it over the wall, and let the database handle. Should I do an insert or should I do an update? Which prevents an entire class of race conditions that we don't wanna run into. There's one other thing that can make this better and that is we can actually get the value back in the same query. But we're gonna look at that in the next video.