Mastering Postgres

I discuss the pros and cons of using unstructured JSON in relational databases like Postgres, balancing between using JSON and JSON-B formats. While JSON-B takes up more space due to storing data in binary form, it significantly improves query performance and manipulation capabilities, making it preferable for dynamic, non-fixed schemas. However, if you depend on maintaining the text format’s characteristics, like key order or spacing, JSON might be more suitable.

We are approaching a relatively controversial topic, and that's putting unstructured JSON or just JSON in general in your relational database. Some people, some stalwarts will say, "Don't you ever do that! The relational database is not made for that." And some people will say, "That's awesome. Let's treat Postgres like Mongo with a primary key and then a blob full of JSON."

I don't buy either argument, honestly. Of course, I think I'm gonna try to land somewhere in the middle. I love storing JSON in the databases, especially in Postgres, that has a ton of JSON affordances, which we will go over the querying and indexing of JSON at a future point. I just wanna talk about the type at this moment.

I do wanna give you a few rules of thumb as you're thinking about how am I gonna model, how am I gonna model my data? Should I sort in JSON or break it out to top level columns? So if you need to constantly query inside of JSON documents, that is a good hint that potentially those should be broken out to top level columns. I will show you a way to do that with generated columns so you can keep your JSON blob intact, but still pull stuff out to the top level. That can be a nice middle ground.

Postgres has lots of ways to get at JSON documents deep inside the nested structure, even to put indexes on 'em. This is not a hard and fast rule that if you need to query by some key in the JSON, then it must be a top level column. That's not what I'm saying. I'm saying if you have a JSON blob and perhaps user email is in there and that's the primary thing by which you look users up, let's break that out into a top level column. Postgres is going to work best when the top level columns have their correct data types and they are discreet individual pieces of information. That's one rule of thumb. Look at your access patterns. How are you going to query this document?

The other thing is, if it is a rigid, well-defined, well understood schema, you might consider breaking those out into top level columns, especially if the pieces are updated independently. Each key is updated independently. A good reason to keep it all as a blob is if the entire blob is just updated all at once. Neither of those are hard and fast rules either because there is good JSON patching support in Postgres. You don't have to do it all at once, but that's just something that can maybe hint at which direction you should go. Finally, you wanna think about the size of the JSON document. Now, a JSON-B column, can support 255 megabytes of JSON. Can it support it and should I do it are different questions, in all of life, but especially in storing JSON documents here. If you start getting massive JSON documents, your performance is gonna take a hit and it would be better, in my opinion, to break those up into individual JSON documents. So, instead of one JSON column that holds 30 megabytes of JSON, maybe you end up with 3, 4, 5 that you can access a little bit more, scoped down a little bit more discreetly.

How are you going to access it? Are you gonna constantly be hitting internal keys? Meh, maybe that points you to discrete columns. Is it a rigid, well-defined schema? And are the pieces updated independently or all in one go? And directionally, what is the size of the JSON document? And could you break that down into a few smaller documents?

You're gonna have to decide if you store JSON in your database. I think there is absolutely nothing wrong with it. We'll have a whole module on how we can manipulate JSON documents, both on querying and updating. So selecting and updating. But for now, let's look at JSON versus JSON-B. There are two different types in Postgres where you can store JSON. Let's do a simple select here and we'll cast this one to JSO. Totally legit, that does work. If we look at type of, we see that it is type of JSON. This is not likely what you're after. This does have its uses, they are minimal, and I'll show you what they are in a second. But what you're likely looking for is JSON-B.

What is the difference? Everybody says, "Use JSON-B, not JSON." But what is the actual difference? We can start to intuit a difference here, not type of, if we did PG column size and we were to look at, we'll say as JSON here. And we'll come down here and say that this one is as JSON-B. That is a huge difference. The JSON comes out to five bytes and the JSON-B comes out to 20. And you might be expecting me to say, "Use JSON 'cause it's smaller and we love compact data types." Guilty as charged. We do love compact data types. However, the reason that this one is bigger is because it has been deconstructed and stored in a binary representation. It is parsed the string, it has parsed the JSON already, and it keeps around a few extra bits or bytes of information that tells Postgres, "Hey, when it comes to querying or selecting, updating, whatever, here's a little bit of stuff that can help you do it faster."

So the JSON-B format is not stored as text under the hood, whereas the JSON format, it is stored as text. Let's see what that actually means here. Instead of a literal number, let's open up, we'll just say that A is hello world. And we're gonna copy that down here as well. I'll just show you these column sizes. They start to get a little bit closer 'cause the overhead starts to get amortized over the entire object, but that's not really what we're after. What we're after, what I want to try to prove to you is that these things are fundamentally different. Anything that you throw at the JSON column is going to be retained. If you were to do that, the crazy white spaces are retained in the JSON column.

However, in the JSON-B column, they're not retained because it's not storing the literal text representation. It is storing the binary form of the JSON, which adheres to certain rules. One of those rules being no duplicate keys. We know that that's a JSON rule. You see that key gets overwritten, whereas this key will not get overwritten, it just remains. I don't know why I made two a string. It just remains. The JSON column will keep whatever you give to it, provided you give it valid JSON. That's a little bit of an assurance there that you can throw anything at it and it will tell you if it's not JSON. But it is not as strict as the JSON-B in terms of what it gives back to you. The JSON-B is going to give back to you a compressed version.

In fact, you shouldn't ever rely on key ordering in JSON. That's just not something that the spec ever guarantees is going to happen. But in JSON-B, it could potentially reorder the keys because that's totally legal. JSON keys are not ordered. You should never rely on the order of JSON keys. However, if you, for whatever legacy reason, I don't recommend it. If you are relying on the order of the keys, again, that column is just text and so it will keep whatever you give it. It's just text under the hood. But as we saw, it does do a little bit of JSON validation. We're gonna look at all the JSON operators, functions, indexing. We're gonna look at that in the future. And there are many of them and they are totally awesome. You can do, check for the existence of keys, check for overlaps in structure in arrays.

You can do all kinds of cool stuff. I think JSON's awesome. I'm gonna show you a few more things here just as a little teaser. And then after that, there's an entire module on JSON that we'll look at later. But, let me show you this. This is pretty good looking JSON. This is wild sequel, seeing all those white spaces there, but that's okay. Coming back, it is compressed.

I want to just show you quickly that you can extract keys out of a JSON object. This is not the only way to extract keys. There are tons of ways to get data out and we'll go over that later. But if you look down here at the column, you'll see it is in quotes, "hello world," because this is the JSON-B representation of the value at the key string, and that is a quoted string of hello world. If you want to turn it into unquoted text, you can use the JSON unquoting operator to do that. You can also get at deeply nested keys. You could say first go to object, then go to key. In fact, let's unquote key. Or first go to array, and then let's go to 0, 1, 2. So you can address it by array index.

There's tons more about JSON that we need to cover, but for now, remember that the column you're probably looking for is JSON-B. Unless you are somehow relying on quirks of JSON and you don't want it parsed, such as duplicate keys, white space for whatever reason, or the order of keys, which you should never rely on. But if you already are, put it in a JSON column and that will preserve everything. JSON-B is a little bit larger, so it takes up more space. But it is so much faster when it comes to operations because that text has already been parsed and that structure is represented in a binary format instead of a text format.

Full Course

PostgreSQL database platform

Summary

Video Transcript