Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Vectors (pgvector)
Vector embedding columns

Full Course

$
349
$399
USD, one-time fee
Like many, I only dealt with databases when necessary. This course changed that, making me feel confident as my team's 'Database Expert'! Aaron is an engaging and insightful educator, making this a 10 out of 10 course. I'll definitely buy more courses from him. Highly recommend—no regrets!
Kyle Bennett
Kyle Bennett

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn how to install an extension in Postgres and create a simple table to store data. In this video, we create a table called `products_v` to store product names and vector embeddings, which are used to find similar products. By comparing embeddings with operators like L2, we can identify products closest to user inputs, enabling features like related product recommendations.

Links

pgvector GitHub repository

Video Transcript

In this video, we're going to get the extension installed and then create a little example table before we move on to some more complicated stuff. Now, the application that I use to run Postgres locally is called Postgres.app. It's great. It's free somehow. I don't know who makes it, but it's awesome. It comes with pgvector already bundled with it. We just have to create the extension in our database. That part is very easy.

If you need to get pgvector yourself from source, I'll leave a link to the GitHub repo down below. There are maybe 10 different ways to get this thing installed depending on what kind of system you're on. If I had to do it from scratch here, I would probably use Homebrew because I am on a Mac, but fortunately I don't have to because it comes bundled already. Because there are so many different systems out there, the docs are just going to be better for that. I'll just leave a link down below. I will show you how to create the extension and then we're going to start by creating a little sample table. The first thing we're going to do is create extension if not exists and the name of the extension is vector. If you run that again, you'll see it's fine. It doesn't matter. It's just going to say, "Hey, that already exists. What are you doing?"

What we can do is we can create a table called products. I added a little V there for Products Vector. We have just a traditional auto incrementing integer and a name. Then the interesting thing is down here, this is the vector. The first thing you'll notice here is we do pass through some sort of dimensionality. This is the number of floating point dimensions that are inside of the vector. Now depending on who you use as your embedding provider, there are ways to say, "Hey, I want 1024 or 768 or 2048," whatever. You can decide how many dimensions you get back. You do need to declare in Postgres how many dimensions you're planning on putting into this particular column. Because I'm going to hand write the embeddings here, we're going to stick with four.

Traditionally you'd have something more like 768, 1024 or something like that. We'll look at that as we go to future videos and get the providers actually giving us embeddings back. For now, I'm not going to hand write that many. Enough yapping, let's go ahead and create that table. We've got products_v here, and we can insert, insert into products_v and we'll just insert name and embedding and the values that we're going to insert here. Let's take a look at this. We're going to insert, we'll just say product A as the first name, and then we're going to insert an embedding. Now, this is the thing that you would get back from, we'll just say OpenAI. Whenever I say OpenAI, understand you could get these embeddings from a lot of different places. This is the thing you would get back from OpenAI. This is the thing you would get back from the model and it's going to look, well, it's not going to look something like that. In fact, it's going to look something with no Q in the middle, it's going to look something like this. It's going to look like a bunch of floating point numbers that make no sense at all. We're assured that they certainly mean something. That is traditionally, and you'll have literally hundreds of them.

For this example, I'm just going to write in some regular numbers. We'll insert product A and then we'll insert product B. Let's give product B a pretty different embedding. Let's say it starts at 9 and then does 10, 11, 12, and 13. How many dimensions? Four dimensions. Okay, we need to make sure that we only have four. We'll lop off the first one here. We've got 1, 2, 3, 4, and 10, 11, 12, 13. Running that and reading it back. Select * from products_v. Now we have our products and our embeddings, which in theory were provided by some outside model. Now these embeddings represent some underlying truth, some underlying reality about these products. Maybe it is you take the product description and all of the tags and the title and you hand that off to a model and get the embeddings back. Or maybe you take the images and hand those off to a model and get the images back. Basically what we're doing here is we're saying, take this human-readable representable thing, text, image, whatever, turn it into a machine-readable representable thing. Then what we're going to do is we're going to say, based on this machine-representable presentation of the underlying reality, let's find all the things that are similar to that.

Now with these products in the database, let's see if we can find similar products. Here we go. We've got these two products here. We've got product A and product B. We can get rid of that guy. Let's, yeah, we can get rid of that guy too. We just have those products. Imagine a scenario where you're given some input from the user and you're doing a sort of search. Let's say you're given some input from the user and then you turn that input into an embedding, either using a local model or one of the providers. Then the way that you can compare these things, you have a couple of different operators.

The first one that we're going to look at is this L2 operator. This is maybe the most common. This computes the distance in a Euclidean geometry. It's very complicated, it's math that I don't fully understand. There's an L2 operator, an L1 operator and a couple of others that we'll look at in a minute. This is maybe the best one for approximating like nearest neighbor similarity. We'll get to that in a second. What you do is you say, all right, take the user input, run it through a model, get an embedding back, and let's just start here by just saying, wow, that embedding came back exactly the same, which is not going to be the case. You see that the L2 distance between this embedding and the embedding for product A, it's zero. It's the exact same embedding. Of course it's going to be the same. Now probably it's going to look something more like this. That's where it starts to be more useful, where we say, ah, this input is kind of shaped like this thing that we have stored, or it's closest to these couple of things that we have stored.

You'll see that this distance for product A is much closer than the distance for product B. Meaning product A is most similar to this vector embedding right here. What you would typically do is you would say select from products order by, and we would say order by embedding. This would be the user's search slash a different product's embedding. So user search we talked about, but a different product's embedding, this can drive a related products feature, right? If you're on the page for product A, you can take this embedding and say, hey, database, give me all the products that are like product A, because I want to show 'em to the user and hope that they buy a bunch more stuff. If they're looking at this, they might be interested in that. This is a way that you can create that feature. That's where we're going to leave our little demo for now.

We'll continue to explore some more advanced use cases, but it is just wild to me that embeddings are a thing. That you can take some portion of reality and represent it as floating point numbers, kind of blows my mind, but it's terribly awesome and so we're going to continue to explore it here.