Enter your email below to watch this video
Mastering Postgres is single-handedly the most thorough and informative database course I’ve seen. I’ve gone from someone who barely knew the basics of Postgres and how it works to being comfortable with the advanced topics of Postgres.Ryan Hendrickson
Shorten dev cycles with branching and zero-downtime schema migrations.
Enums are an interesting data type because to the human, to you me, it looks like a string. To the database to Postgres, it looks like an integer. It's actually stored as an integer under the hood. You get this nice marrying of two things, which is compact storage human readability. Part of the deal with enums is it is a finite fixed list that you can choose from.
Imagine something like a blog post that has a status the status is draft published, archived, we'll say destroyed. Those would be the four discrete statuses that would make a potentially a great use case for an enum because when you have an enum, you list out what the allowable options are, which leads us to when would an enum be bad? Well, if the options are constantly changing.
If you're working with a field, where business logic dictates that these options are kind of being updated added removed to all the time, I wouldn't use an enum for that. I also wouldn't use an enum when a lookup table would be better. Maybe you've got, you want to do a dropdown of the states, the United States, all 50 states in the US, you also kind of want to show their capitals you kind of want to show the population you kind of at that point just make it its own table do a foreign key, which we'll talk about later.
There are several use cases where an enum works really well there are some use cases where check constraint works even better. Let me show you how an enum works. Then at the end we can talk about enums versus check constraints. This is what it looks like to create a new enum. You have to create a type then you give it your your enum name you say AS ENUM. Then you list out the discrete values that are allowed. Then after creating this type, you can create a table give it a column use that instead of saying that current_mood is text, we say current_mood is mood. Then if we were to do that then insert a few values we can insert happy sad, nothing interesting has happened yet. We have the happy sad in there, right? What we've done, created a type, created a table with a column of that type inserted some legal values.
If we were to insert an illegal value, we do get that nice data validation. Similar to a check constraint, it is checking on the way in to make sure that it is one of these values listed here. I am going to insert a few more. Let's insert a few more legit ones. Let's do sad. Let's just do all of them. We'll do sad, neutral, sad, happy. If we were to run that then select from here, you see we get all of our values.
Interestingly, if we were to order by mood, it's called current_mood. Order by current_mood, you might see something that you didn't expect here. If you were ordering, if this was text you were ordering, you would have happy then neutral then sad because it would order alphabetically. It is not ordering alphabetically, it is ordering by the order that you declared these values here. This can be really useful. This can be very useful when you have something like, let's say we'll do extra small, small, medium, large, extra large. If you're choosing some sort of shirt size, clothing size, whatever, having extra small followed immediately by small makes a lot more sense than having extra large followed immediately by extra small. You can take advantage of this reality. It's not even a quirk, it's just how it works.
You can take advantage of how this works by declaring your enumerated values in the correct order. Then this shows up correctly in the UI. That can be kind of nice, if you're not expecting it, that can be really, really annoying. I'm going to show you in a little bit how we can see under the hood see, in case we don't recall, we can see what order these things were declared in. I'm going to show you right now how you can add a new one then potentially put it kind of in the middle wherever you want it to go. Let's get rid of all of this. We don't need that anymore we don't need that anymore. We'll keep that around just for fun. What I'm going to do now is say ALTER TYPE mood ADD VALUE 'excited.' if I add that, then excited immediately becomes one of the values I'm allowed to insert. You'll see that that shows up down at the bottom because it was added to the very end.
If I were to add afraid, we can also say before sad. If I added 'afraid' before 'sad' then ran inserted afraid ran this again, happy, happy, happy, afraid, sad. I was able to insert one in the middle of the existing order, which is nice. Some other databases only allow enums to be added at the end, you can add this wherever you want. You can also add after, we'll add one after. Let's add it after afraid we'll put melancholic, that's a big word. Let's alter that type then insert that then run that. We inserted it after afraid. It should be after afraid indeed it is.
Unfortunately, removing a value from your enum is not possible. You have to drop it create a new one, which kind of sucks. Lemme show you how to do that. We're going to create a mood_new as an enum we're just going to go back to happy, sad, neutral afraid. If we run that, then we can alter our table change the current_mood to this mood_new type. We also want to use the current_mood, cast it to a text value then cast that to this new enum value. Unfortunately, watch this, of course it doesn't work because we have some bad data in there. Well, it's not necessarily bad data, it is data that used to be good, now we're changing the requirement, which is totally fine. Businesses change, there's nothing to be sad about there.
What I'm going to do here is I'm going to wrap this up in a transaction, which we'll talk about later, I'm going to wrap this up that everything that is happening here will happen all at once instead of doing some stuff letting bad data sneak in. I have started that transaction what I'm going to do is I'm going to say update. Let's put a semicolon there. Update enum_example set current_mood = if they weren't feeling neutral before, they're feeling neutral now. Set current_mood equal = "neutral" where current_mood not in. Let's take those guys right there. What we're saying is, hey, these are the new moods that you're allowed to have. If you're not feeling one of these moods, you're feeling neutral. Now we can run this. We'll run that, it says 14 rows affected. Then we can say, let's go ahead make that the new that is the new allowed set of moods. Then we can commit that. If we look in here now select * from enum_example, then we should only see happy, sad, happy, sad, afraid, neutral. All the bottom ones are now neutral. Now that that's run, you can safely drop the old enum type. If you want rename the new enum type to drop that trailing underscore new. That's just a personal preference. That's totally up to you.
A little bit unwieldy here in removing a value from the enum set, adding a value is quite easy. Let me show you some of the under the hood inner workings of these enums. The first thing that I want to do is I want to be able to cast this to an integer because I know that there's an integer down there somewhere, unfortunately none of these, none of these work. You do have a few things that you can do you can look in the pg_catalog at the pg_enum if you look in there, ooh, this is interesting, isn't it? We still have mood mood new hanging out. We see two different enum PIDs.
We see what looks like the second one here because it's happy, sad, neutral, afraid. We see this first one here. Look how they handle inserting before after. They just kind of split the difference. It's this infinite, find the difference split it in half kind of deal. That is that value that we're looking at is the enum sort order. This is that thing I was telling you about earlier. If you sort by enums it sorts by the internal representation, not the label, boom, there's your proof right there. In this case it will go happy, afraid, melancholic, sad, neutral, excited. If we were, no, not that. If we were to say, let's pull this out say select from where enumtypid equals that order by enumsortorder. Oops, there you'll see, happy, afraid, melancholic, sad, neutral, excited. Those 0.5 0.75 is 'cause we inserted some stuff in the middle. Another thing that we can look at is this enum_range.
If you forget, which is totally fine, if you forget how they were declared in the beginning, you can do this enum_range cast it to a mood. You'll say null cast to mood. That will show you all of them in the order that they currently exist. I was going to say in the order that they were declared, in the order that they currently exist is more accurate 'cause we inserted a few guys in the middle. This does accept two different values. We can say sad cast to mood that will cut off the range at sad. If we wanted to say, let's do what's available from the range of afraid to sad, well you get afraid, melancholic sad? This could be useful. This could be useful if your enums have semantic meaning you declared them in the right order you wanted to know what are the things available from here to there, that could be totally useful. I do want to talk about check constraints though.
Check constraints are another viable option that you can use instead of enums. You don't have to use an enum. If you use a check constraint on a text column, you do lose that nice compact storage of an enum under the hood. You are going to be storing a string. That may be totally fine for your use case. Check constraints can be a little bit easier to deal with, a little bit easier to alter. You could also use a domain over a text column including a check constraint. That would be pretty easy to deal with if the values are constantly adding or being added removed in terms of what are the legal values.
If you are not in the habit of looking at your data in an actual query results view you're just kind of using it on the application side, I am perfectly fine if you make your status column an integer. I don't really have a problem with that. Your coworkers might because they may look at the integer say, what the heck is a one versus a two versus a three versus a four? I have no idea. It is completely opaque you, without looking at the application side code, you don't know what those statuses stand for. That has never been a problem for me. That would be a discussion for your team.
In that way, you get the compact storage of an integer without the readability of a string. An enum marries those two things. A text column with either a check constraint or a domain over it, can be a little bit easier on the maintenance side. If your list of items is fixed you want to use an enum, enums are amazing, just be aware of the underlying sort order issue, that you don't get bitten by that when you aren't expecting it.