Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced Data Types
JSON

Full Course

$
349
$399
USD, one-time fee
Going through the Mastering Postgres course by Aaron Francis and holy cow is it well designed. These types of well-polished knowledge products bring me an immense amount of joy.
Adam Taylor

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn the pros and cons of using unstructured JSON in relational databases like Postgres, with a focus on the differences between the JSON and JSON-B formats. While JSON-B uses more space due to its binary storage format, it greatly improves query performance and manipulation capabilities, making it ideal for dynamic, non-fixed schemas. If maintaining text format characteristics, such as key order or spacing, is important, the regular JSON format may be a better choice.

Video Transcript

We are approaching a relatively controversial topic, and that's putting unstructured JSON or just JSON in general in your relational database. Some people, some stalwarts will say, "Don't you ever do that! The relational database is not made for that." And some people will say, "That's awesome. Let's treat Postgres like Mongo with a primary key and then a blob full of JSON."

I don't buy either argument, honestly. Of course, I think I'm gonna try to land somewhere in the middle. I love storing JSON in the databases, especially in Postgres, that has a ton of JSON affordances, which we will go over the querying and indexing of JSON at a future point. I just wanna talk about the type at this moment.

I do wanna give you a few rules of thumb as you're thinking about how am I gonna model, how am I gonna model my data? Should I sort in JSON or break it out to top level columns? So if you need to constantly query inside of JSON documents, that is a good hint that potentially those should be broken out to top level columns. I will show you a way to do that with generated columns so you can keep your JSON blob intact, but still pull stuff out to the top level. That can be a nice middle ground.

Postgres has lots of ways to get at JSON documents deep inside the nested structure, even to put indexes on 'em. This is not a hard and fast rule that if you need to query by some key in the JSON, then it must be a top level column. That's not what I'm saying. I'm saying if you have a JSON blob and perhaps user email is in there and that's the primary thing by which you look users up, let's break that out into a top level column. Postgres is going to work best when the top level columns have their correct data types and they are discreet individual pieces of information. That's one rule of thumb. Look at your access patterns. How are you going to query this document?

The other thing is, if it is a rigid, well-defined, well understood schema, you might consider breaking those out into top level columns, especially if the pieces are updated independently. Each key is updated independently. A good reason to keep it all as a blob is if the entire blob is just updated all at once. Neither of those are hard and fast rules either because there is good JSON patching support in Postgres. You don't have to do it all at once, but that's just something that can maybe hint at which direction you should go. Finally, you wanna think about the size of the JSON document. Now, a JSON-B column, can support 255 megabytes of JSON. Can it support it and should I do it are different questions, in all of life, but especially in storing JSON documents here. If you start getting massive JSON documents, your performance is gonna take a hit and it would be better, in my opinion, to break those up into individual JSON documents. So, instead of one JSON column that holds 30 megabytes of JSON, maybe you end up with 3, 4, 5 that you can access a little bit more, scoped down a little bit more discreetly.

How are you going to access it? Are you gonna constantly be hitting internal keys? Meh, maybe that points you to discrete columns. Is it a rigid, well-defined schema? And are the pieces updated independently or all in one go? And directionally, what is the size of the JSON document? And could you break that down into a few smaller documents?

You're gonna have to decide if you store JSON in your database. I think there is absolutely nothing wrong with it. We'll have a whole module on how we can manipulate JSON documents, both on querying and updating. So selecting and updating. But for now, let's look at JSON versus JSON-B. There are two different types in Postgres where you can store JSON. Let's do a simple select here and we'll cast this one to JSO. Totally legit, that does work. If we look at type of, we see that it is type of JSON. This is not likely what you're after. This does have its uses, they are minimal, and I'll show you what they are in a second. But what you're likely looking for is JSON-B.

What is the difference? Everybody says, "Use JSON-B, not JSON." But what is the actual difference? We can start to intuit a difference here, not type of, if we did PG column size and we were to look at, we'll say as JSON here. And we'll come down here and say that this one is as JSON-B. That is a huge difference. The JSON comes out to five bytes and the JSON-B comes out to 20. And you might be expecting me to say, "Use JSON 'cause it's smaller and we love compact data types." Guilty as charged. We do love compact data types. However, the reason that this one is bigger is because it has been deconstructed and stored in a binary representation. It is parsed the string, it has parsed the JSON already, and it keeps around a few extra bits or bytes of information that tells Postgres, "Hey, when it comes to querying or selecting, updating, whatever, here's a little bit of stuff that can help you do it faster."

So the JSON-B format is not stored as text under the hood, whereas the JSON format, it is stored as text. Let's see what that actually means here. Instead of a literal number, let's open up, we'll just say that A is hello world. And we're gonna copy that down here as well. I'll just show you these column sizes. They start to get a little bit closer 'cause the overhead starts to get amortized over the entire object, but that's not really what we're after. What we're after, what I want to try to prove to you is that these things are fundamentally different. Anything that you throw at the JSON column is going to be retained. If you were to do that, the crazy white spaces are retained in the JSON column.

However, in the JSON-B column, they're not retained because it's not storing the literal text representation. It is storing the binary form of the JSON, which adheres to certain rules. One of those rules being no duplicate keys. We know that that's a JSON rule. You see that key gets overwritten, whereas this key will not get overwritten, it just remains. I don't know why I made two a string. It just remains. The JSON column will keep whatever you give to it, provided you give it valid JSON. That's a little bit of an assurance there that you can throw anything at it and it will tell you if it's not JSON. But it is not as strict as the JSON-B in terms of what it gives back to you. The JSON-B is going to give back to you a compressed version.

In fact, you shouldn't ever rely on key ordering in JSON. That's just not something that the spec ever guarantees is going to happen. But in JSON-B, it could potentially reorder the keys because that's totally legal. JSON keys are not ordered. You should never rely on the order of JSON keys. However, if you, for whatever legacy reason, I don't recommend it. If you are relying on the order of the keys, again, that column is just text and so it will keep whatever you give it. It's just text under the hood. But as we saw, it does do a little bit of JSON validation. We're gonna look at all the JSON operators, functions, indexing. We're gonna look at that in the future. And there are many of them and they are totally awesome. You can do, check for the existence of keys, check for overlaps in structure in arrays.

You can do all kinds of cool stuff. I think JSON's awesome. I'm gonna show you a few more things here just as a little teaser. And then after that, there's an entire module on JSON that we'll look at later. But, let me show you this. This is pretty good looking JSON. This is wild sequel, seeing all those white spaces there, but that's okay. Coming back, it is compressed.

I want to just show you quickly that you can extract keys out of a JSON object. This is not the only way to extract keys. There are tons of ways to get data out and we'll go over that later. But if you look down here at the column, you'll see it is in quotes, "hello world," because this is the JSON-B representation of the value at the key string, and that is a quoted string of hello world. If you want to turn it into unquoted text, you can use the JSON unquoting operator to do that. You can also get at deeply nested keys. You could say first go to object, then go to key. In fact, let's unquote key. Or first go to array, and then let's go to 0, 1, 2. So you can address it by array index.

There's tons more about JSON that we need to cover, but for now, remember that the column you're probably looking for is JSON-B. Unless you are somehow relying on quirks of JSON and you don't want it parsed, such as duplicate keys, white space for whatever reason, or the order of keys, which you should never rely on. But if you already are, put it in a JSON column and that will preserve everything. JSON-B is a little bit larger, so it takes up more space. But it is so much faster when it comes to operations because that text has already been parsed and that structure is represented in a binary format instead of a text format.