Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Vectors (pgvector)
Other operators

Full Course

$
349
$399
USD, one-time fee
Just finished Aaron Francis' Mastering Postgres course. Highly recommend 🙂
Savannah

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn about different operators for comparing vectors, including L2, cosine, L1, and inner product. The choice of operator depends on your data and model, with recommendations from OpenAI and PG Vector.

Video Transcript

We're getting to a point that is outside of my mathematical prowess. Everything we've searched on so far has been using the L2 operator, the L2 comparison function, and that is a Euclidean comparison. Now this is the point, where as not a mathematician, we start to get outside the bounds of my understanding, but I do want to talk to you about the difference between three operators that you can use and where you might use one versus the other.

This is the L2 operator that we have been using. That is probably a totally fine default. There is also the cosine operator and the cosine operator looks like that. We've got that in there as well. The cosine operator looks like that. There is another operator called the L1, and these are just different ways to compare the underlying embeddings. The L1 operator is a plus sign and then there's another one called the inner product and this guy is a pound sign. What do we even call that, hashtag? It's been so long, number sign. These are the different operators.

Now, which one should you choose? Great question, it kind of depends. If you read the docs on PG Vector, it says when the embeddings are normalized, such as with Open AI, the inner product is going to be the most performant. If you go read some of the deeper technical papers from Open AI, it says it doesn't really matter, it does actually say it doesn't matter that much, but it recommends the cosine operator when the vectors have been normalized. Some of this depends on the data, some depends on the model that you're using, L1 is very resilient to big outliers or anomalies, where L2 can be skewed. I'm just looking for like a right down the middle, 80, 90, 95% use case.

According to Open AI, the cosine similarity is that, and according to PG Vector, the inner product is that, L2 is a very well understood, very commonly used distance comparison for vectors. Between the three of those, you'll find one that works very well for your use case. Take into consideration the model and their recommendation for comparing the vectors that they give you 'cause that's pretty important. When we look at indexing in the next video, we will continue to talk about these operators 'cause some of these indexes have flags that you can pass based on the operator that you plan to use. Again, this is a little bit above my pay grade in the mathematical sense, and so when you're considering what operator to use, you have to test it against your data.