The most comprehensive course on PostgreSQL

Learn how to choose the right numeric data types for your database, with a focus on integers and their ranges. The video covers the differences between `SMALLINT`, `INT4`, and `BIGINT`, helping you select the most appropriate type for your data while minimizing storage space. It also highlights the absence of unsigned integers in Postgres and explains how to use check constraints for data validation.

We're gonna start our discussion about data types with numeric data. Our discussion here is how do we build a good solid foundation? How do we create a correct table such that we don't run into problems down the line?

Now, this video is gonna be about integers. In fact, it's gonna be about integers that fit within a bounded range. In future videos, we'll talk about fractional parts, we'll talk about arbitrary precision, money, unbounded ranges, all kinds of good stuff. I'm gonna try to keep this one scoped to integers and unbounded ranges.

Remember that we're trying to pick a data type that is both representative and compact. Representative means we need to choose one that encompasses the full range of our data. Please do not mangle your data to save a few bytes here and there. Bad idea. Compact means if we have a range that goes from 0 to 100, don't put it in a big integer. That is a waste of space. In a future-future video, which means a long way away, we're gonna talk about ordering your columns such that your table can be even more compact. This all builds on top of each other.

Let's start with integers. Here we have a little CREATE TABLE statement that is not very good, because we don't even have a primary key in here. But that's not the point, right? So this is what we're doing right here. What kind of data type are we going to use for age? The first thing you always wanna do is look at your data, study your data carefully, find out what the range of your data is, and then ask yourself: "Is that the range of the reality?" Because your data represents something in the real world, right?

If you look at your data and you say, "I've got people from 0 to 90 years old in there. I've got a user that's 90 years old." That's great. Is that reality? That is the bounds of your current data, but is it reality or is that reality? Is anyone ever gonna live to be past 200? Not while this database exists. Maybe at some point in the future. Honestly, if you wanted to put 1,000, that's still a reasonable range for your data. Inspect your data, figure out what the bounds are, and then let's match that up to a data type.

If we look at the data types that can support that, we've got SMALLINT which goes up to I think 32,767, I think is how far it goes. Interestingly, it also goes that far on the negative side. It goes from -32,768 to 32,767. A SMALLINT seems pretty good to me here.

Interestingly, if you're coming from MySQL, it may surprise you that I'm not typing unsigned in there or back here, wherever it's supposed to go. That may surprise you. However, Postgres does not have the concept of unsigned integers. If you're coming from SQLite, you expected that; if you're coming from MySQL, that is a surprise to you. That does not mean we cannot enforce that somebody must be greater than zero years old or greater than an equal to, actually, because kids are zero. That may surprise you. That's okay. We're gonna talk about check constraints later.

What we're looking at is how do we fit, how do we fit our data into a data type? If we run this, we will see if, we hop over here, I'm just gonna do it on the command line. I just want to take a look at this table. I'm gonna do \d table name. You see we got that table created.

Interestingly, if we hop back to TablePlus and we drop that, there is this little bit esoteric alias, but I do like it for one reason. If we declare that as int2 and then we hop back over here and run that again, you'll see it is in fact an alias. Even though we declared it as int2, ah, it's actually a SMALLINT, why do I like this little esoteric alias? Because it's informative. This is an integer that takes two bytes. While it is a little bit opaque and not the friendliest name ever, I do like it because it's information-dense. It is an integer that takes two bytes, which leads us to believe that there is an int4 and an int8.

First, we're gonna carry forward with, I'll leave that there, we're gonna carry forward with this int2. I'm going to insert, I'm not gonna make you watch me type it, I'm gonna insert some data into this table, and there we go. Does it work? Did it work? I'm gonna leave that there, and we'll do select * from smallint_example.

Our data is represented correctly, faithfully, no errors, no loss of precision or anything. What happens if you have some sort of mutant that lives to be 40,000 years old? We can call this person a robot. If we run that you'll see "smallint out of range". We get a little bit of, we get a little bit of data integrity and enforcement here. I don't think that's a good reason. I'm glad that we have that. I'm glad that we have that, because if somebody accidentally types in 40,000, they're gonna get an error. But frankly, if somebody types in 32,000, that is also an error, but we're just going to allow it.

The idea of, in this example, the idea of using this as some sort of business logic constraint doesn't move me. What does move me is using a check constraint or a domain to enforce that sort of like data validation business logic. What we're trying to do here is just get the most compact type that matches our data.

Let's move on to the int4. The int4 is likely the one that you are gonna be using most often. I think most frameworks, when they're creating their migrations, it just kind of defaults to an integer. Side note on that, I have no problem whatsoever if you're using a web application framework or some other kind of framework to write your migrations. That doesn't bother me at all. If anybody ever tells you you're not a real developer because you didn't hand-roll your SQL. I don't think that is true. What I will say, however, is you're still responsible for the generated SQL. It is nice to use a tool that writes migrations for you. At the end of the day, you are still in charge and you need to know what that tool is doing. However you wanna do it, that's great. That's up to you. You're still the boss.

int4, this is the one that you'll likely be using. It is also integer, just no qualifier Integer. It doesn't make sense for age. I'm gonna drop in a new table here. The new table is going to be item name and stock. Some sort of back office warehouse kind of setup. With the extra two bytes, we get a lot bigger range. For an integer, I am going to copy and paste this because that's a lot to type. It goes up to 2.1 billion. Also, it goes down to 2.1 billion. This range is enormous. This covers most of your needs I would have to imagine. Here's one place I would say give your data way too much room to grow: and that is primary key IDs.

For a primary key ID, we're just gonna default to bigint because we don't wanna run outta space. I think for everything else, integer is probably fine; if you know that it's super small, stick with a SMALLINT. We can run this. I'm not gonna run this.

We're gonna move straight on to bigint. bigint is also eight bytes. We're getting a little bit of a picture here. we have two, we have four, and then we have eight down here at bigint. Boy, am I gonna copy this one, because it is nine quintillion. That's right. What is a quintillion? It's a very big number. It goes from negative nine quintillion to positive nine quintillion. That's a very, very big range. You can see it is four times bigger than a smallint. Does that matter? Depends on how many columns you're doing that on and how many rows you have. Across a billion rows. ooh, it might matter. Then when we start talking about making our tables compact at kind of a lower level, eh, it kind of matters.

If you need a bigint, and a bigint can be something like, we could do something like file_size and change this to file_name. We could change that to file_size in bytes, and that would be a good big integer. We would just change that to biginteger_example.

You can see you have a few options. Pick the one that is most compact, that fits your entire data, but do not mingle your data. If you need to enforce some sort of domain logic, we can have check constraints in domains, which we'll cover in a little bit. Remember that, by default, there's no way to get an unsigned integer in Postgres without using a check constraint, which is fine, because we'll do that in a future video.

Full Course

PostgreSQL database platform

Summary

Video Transcript