Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Watch for free

Enter your email below to watch this video

Video thumbnail
Data Types
Integers

Full Course

$
349
$399
USD, one-time fee
Going through the Mastering Postgres course by Aaron Francis and holy cow is it well designed. These types of well-polished knowledge products bring me an immense amount of joy.
Adam Taylor

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn how to choose the right numeric data types for your database, with a focus on integers and their ranges. The video covers the differences between `SMALLINT`, `INT4`, and `BIGINT`, helping you select the most appropriate type for your data while minimizing storage space. It also highlights the absence of unsigned integers in Postgres and explains how to use check constraints for data validation.

Video Transcript

We're gonna start our discussion about data types with numeric data. Our discussion here is how do we build a good solid foundation? How do we create a correct table such that we don't run into problems down the line?

Now, this video is gonna be about integers. In fact, it's gonna be about integers that fit within a bounded range. In future videos, we'll talk about fractional parts, we'll talk about arbitrary precision, money, unbounded ranges, all kinds of good stuff. I'm gonna try to keep this one scoped to integers and unbounded ranges.

Remember that we're trying to pick a data type that is both representative and compact. Representative means we need to choose one that encompasses the full range of our data. Please do not mangle your data to save a few bytes here and there. Bad idea. Compact means if we have a range that goes from 0 to 100, don't put it in a big integer. That is a waste of space. In a future-future video, which means a long way away, we're gonna talk about ordering your columns such that your table can be even more compact. This all builds on top of each other.

Let's start with integers. Here we have a little CREATE TABLE statement that is not very good, because we don't even have a primary key in here. But that's not the point, right? So this is what we're doing right here. What kind of data type are we going to use for age? The first thing you always wanna do is look at your data, study your data carefully, find out what the range of your data is, and then ask yourself: "Is that the range of the reality?" Because your data represents something in the real world, right?

If you look at your data and you say, "I've got people from 0 to 90 years old in there. I've got a user that's 90 years old." That's great. Is that reality? That is the bounds of your current data, but is it reality or is that reality? Is anyone ever gonna live to be past 200? Not while this database exists. Maybe at some point in the future. Honestly, if you wanted to put 1,000, that's still a reasonable range for your data. Inspect your data, figure out what the bounds are, and then let's match that up to a data type.

If we look at the data types that can support that, we've got SMALLINT which goes up to I think 32,767, I think is how far it goes. Interestingly, it also goes that far on the negative side. It goes from -32,768 to 32,767. A SMALLINT seems pretty good to me here.

Interestingly, if you're coming from MySQL, it may surprise you that I'm not typing unsigned in there or back here, wherever it's supposed to go. That may surprise you. However, Postgres does not have the concept of unsigned integers. If you're coming from SQLite, you expected that; if you're coming from MySQL, that is a surprise to you. That does not mean we cannot enforce that somebody must be greater than zero years old or greater than an equal to, actually, because kids are zero. That may surprise you. That's okay. We're gonna talk about check constraints later.

What we're looking at is how do we fit, how do we fit our data into a data type? If we run this, we will see if, we hop over here, I'm just gonna do it on the command line. I just want to take a look at this table. I'm gonna do \d table name. You see we got that table created.

Interestingly, if we hop back to TablePlus and we drop that, there is this little bit esoteric alias, but I do like it for one reason. If we declare that as int2 and then we hop back over here and run that again, you'll see it is in fact an alias. Even though we declared it as int2, ah, it's actually a SMALLINT, why do I like this little esoteric alias? Because it's informative. This is an integer that takes two bytes. While it is a little bit opaque and not the friendliest name ever, I do like it because it's information-dense. It is an integer that takes two bytes, which leads us to believe that there is an int4 and an int8.

First, we're gonna carry forward with, I'll leave that there, we're gonna carry forward with this int2. I'm going to insert, I'm not gonna make you watch me type it, I'm gonna insert some data into this table, and there we go. Does it work? Did it work? I'm gonna leave that there, and we'll do select * from smallint_example.

Our data is represented correctly, faithfully, no errors, no loss of precision or anything. What happens if you have some sort of mutant that lives to be 40,000 years old? We can call this person a robot. If we run that you'll see "smallint out of range". We get a little bit of, we get a little bit of data integrity and enforcement here. I don't think that's a good reason. I'm glad that we have that. I'm glad that we have that, because if somebody accidentally types in 40,000, they're gonna get an error. But frankly, if somebody types in 32,000, that is also an error, but we're just going to allow it.

The idea of, in this example, the idea of using this as some sort of business logic constraint doesn't move me. What does move me is using a check constraint or a domain to enforce that sort of like data validation business logic. What we're trying to do here is just get the most compact type that matches our data.

Let's move on to the int4. The int4 is likely the one that you are gonna be using most often. I think most frameworks, when they're creating their migrations, it just kind of defaults to an integer. Side note on that, I have no problem whatsoever if you're using a web application framework or some other kind of framework to write your migrations. That doesn't bother me at all. If anybody ever tells you you're not a real developer because you didn't hand-roll your SQL. I don't think that is true. What I will say, however, is you're still responsible for the generated SQL. It is nice to use a tool that writes migrations for you. At the end of the day, you are still in charge and you need to know what that tool is doing. However you wanna do it, that's great. That's up to you. You're still the boss.

int4, this is the one that you'll likely be using. It is also integer, just no qualifier Integer. It doesn't make sense for age. I'm gonna drop in a new table here. The new table is going to be item name and stock. Some sort of back office warehouse kind of setup. With the extra two bytes, we get a lot bigger range. For an integer, I am going to copy and paste this because that's a lot to type. It goes up to 2.1 billion. Also, it goes down to 2.1 billion. This range is enormous. This covers most of your needs I would have to imagine. Here's one place I would say give your data way too much room to grow: and that is primary key IDs.

For a primary key ID, we're just gonna default to bigint because we don't wanna run outta space. I think for everything else, integer is probably fine; if you know that it's super small, stick with a SMALLINT. We can run this. I'm not gonna run this.

We're gonna move straight on to bigint. bigint is also eight bytes. We're getting a little bit of a picture here. we have two, we have four, and then we have eight down here at bigint. Boy, am I gonna copy this one, because it is nine quintillion. That's right. What is a quintillion? It's a very big number. It goes from negative nine quintillion to positive nine quintillion. That's a very, very big range. You can see it is four times bigger than a smallint. Does that matter? Depends on how many columns you're doing that on and how many rows you have. Across a billion rows. ooh, it might matter. Then when we start talking about making our tables compact at kind of a lower level, eh, it kind of matters.

If you need a bigint, and a bigint can be something like, we could do something like file_size and change this to file_name. We could change that to file_size in bytes, and that would be a good big integer. We would just change that to biginteger_example.

You can see you have a few options. Pick the one that is most compact, that fits your entire data, but do not mingle your data. If you need to enforce some sort of domain logic, we can have check constraints in domains, which we'll cover in a little bit. Remember that, by default, there's no way to get an unsigned integer in Postgres without using a check constraint, which is fine, because we'll do that in a future video.