Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced Data Types
Generated columns

Full Course

$
349
$399
USD, one-time fee
Just finished Aaron Francis' Mastering Postgres course. Highly recommend 🙂
Savannah

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Generated columns in Postgres are a great way to keep data tidy and synchronized without the need for complex queries. These formula-based columns automatically calculate their values from other data, saving time and reducing errors. They’re particularly useful for simplifying data models and ensuring consistent data without manual input.

Video Transcript

Generated columns gotta be one of my favorite features of any database, Postgres included. They are an escape hatch. They're a get out of jail free card. They're a nice way to tidy up some gross queries. In the process of developing your schema, business requirements will change, databases will probably change, if you keep going long enough, deadlines can be tight, and your knowledge can change. You may look back on a database that's five years old and think, "Boy, what the heck was I doing?" That's okay. That totally happens. And you can adjust the data model as you go.

A nice middle ground or maybe a stop gap can be a generated column. It can paper over some of the gross stuff in your data model. They're also very, very useful for beautiful, pristine data models. Whether or not they're messy, it doesn't really matter. But generated columns can be an escape hatch.

So the question is, what the heck is a generated column? A generated column is in fact a column, but it's basically a reference to another column and then some sort of transformation, some sort of function, some sort of modification. And I'm gonna show you a few here, but you can think of it as maybe a formula-based column in Excel or Google Sheets. You have your one column with all the data, and then next to it you have a formula that references that column. That's all a generated column is.

Traditionally, there are two types, virtual and stored. Postgres only has... I need to put down this finger. Postgres only has stored, so there is no virtual generated column in Postgres, which is a little bit unfortunate, but it's fine, we'll get by. Let me show you a few examples of what these are. Then honestly, throughout the rest of the course, we might pick up generated columns again. So let me show you a few.

Here's a very basic example, a people table that has height in centimeters and height in inches. Now, these two things represent the same fundamental truth, right, a person's height. They're not really discreet pieces of information. They're the same piece of information fundamentally, just represented in different units. Instead of trying to keep these two things in sync, what we're gonna do is we're gonna say this is GENERATED ALWAYS AS, and this might ring a bell 'cause we've done GENERATED ALWAYS AS IDENTITY. But instead of doing that, what we're gonna do is we're gonna do open parentheses here, and then we're gonna say (height_cm / 2.54). And then at the end, you must write STORED. In some other databases, you can write VIRTUAL. And that just becomes kind of like a query macro almost, in that, it's always computed at runtime.

However, in Postgres, you must write STORED, which means it's computed at INSERT and UPDATE and then actually physically written to storage, which is totally fine. If we run that and we were to insert into people, we'll insert (height_cm), and honestly, I don't know how many centimeters tall I am. I don't really know what a centimeter is. I'm gonna insert (height_cm) values (100). Now if we read that back, select * from people, let's see how close I got. So 39 inches. That doesn't sound very tall. select 39/12, that's not very tall. Let's do 200. So if we did 200, so we need to run that, and then run that back, so 78 inches is how many feet? Six feet. That feels a lot better.

Okay, so what we've done is we've inserted a value into the centimeter column, but the inches column was calculated for us based on the centimeter column. Now, interestingly, you cannot muck this up because you do not have control. If we were to say, sure, that's, you know, six feet, which would be... What did we decide? 78-ish inches. You just can't do it. You're not allowed to do it. Which gives me great comfort because I don't want people accidentally inserting data in here if they didn't know that this was a generated column. This is nice. It keeps that data. It keeps that data in perfect sync. Another place, I haven't actually used it to do inches and centimeters, but a place that I have actually done it is create table users. And here we're gonna say, you know, imagine a bunch of stuff and then we say email is text. And then we can say email_domain. And this might be really, really useful because the domain is a part of that email address, right? And so if you wanna search on domain or group on domain, you're gonna have to do some extraction or some left hand wild cards or something like that to search on domain alone. And maybe you block certain domains or maybe you wanna group by corporate domains versus free domains.

Whatever you wanna do, you can extract the domain part by doing something like text GENERATED ALWAYS AS, and then we're gonna put our function in there, which we need to figure out, ALWAYS AS () STORED. And then let's figure out what our actual function is. Uf I did select my email, tryhardstudios. Please, I mean, I guess you can email me, but don't, you know, don't send me any spam or anything. Okay, so that's good there. And then we're gonna use split_part and the delimiter that we're gonna split on is the at sign and then 2 because it is 1 indexed. So if we did 1, it would give us Aaron. I want 2 for tryhardstudios. Then we can just yoink that and come back up here, remove the hard code and say, we'll take that and make that an email. So now, with any luck, we can say insert into users (email) values ('[email protected]'). Gotta do the right table name and select * from users. There we go. So that is actually a pretty useful use case for extracting pieces of text out. And now I have my email domain that I can filter on, I can group on, I can add an index on, all kinds of good stuff.

The restrictions on generated columns are, in fact, many, but you'll probably only run into a few of the restrictions. One is you have to reference the current row. You can't reference other rows, other tables, subqueries, anything like that. The other is your function, or the way in which the generated column is populated must be deterministic. You can't use any volatile functions, which means random, current time, random UUID, anything like that. Given an input, it must always produce the same output.

The other restriction that you might run into is you can't reference another generated column. Unlike in Excel, you can't just have formulas that just reference each other over and over and over, all the way back. It all has to be self-contained. A generated column can't reference another generated column. We're gonna use these... I am not ashamed of generated columns. We're gonna use these over and over and over again because they're super powerful, both for papering over a messy data model and for papering over messy data. It's not always in your control.

Sometimes the data is messy, sometimes the access patterns change and you need to extract a piece. This can be great for extracting parts of JSON, which we'll look at later. But I just wanted to give you an intro to generated columns because we'll be using these throughout the rest of the course.