Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Full Text Search
Vectors, queries, and ranks

Full Course

$
349
$399
USD, one-time fee
The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!
Nakolus
Nakolus

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Explore how to perform efficient full-text searches in PostgreSQL using features like `tsvector`, `tsquery`, and `ts_rank`. You'll learn how these tools break down text into searchable components, create a query language to match words, and rank results by relevance. By leveraging these functions, you can fine-tune search queries to deliver more accurate and meaningful results.

Video Transcript

We had a fun little detour with like, and I like, which are useful but not terribly useful for full text search. Now we're gonna look at tsquery, tsvector, tsrank, and these are all the ones that are specifically built for full text search. We did a little bit of this back in the data types videos, but now we're gonna go into it even further.

Let's start by looking at tsvector again. To get a tsvector, we're gonna say to_ tsvector on the title column. Let's go ahead and pull back the title column as well so we can see what we're working with. From movies limit, we'll just limit down to 50 and we can take a look at this. We get this wonky looking format here that you can kind of read through. "Kansas Saloon Smashers," which I'm going to have to watch, breaks down into 'kansa,' 'saloon,' and 'smasher.' Thinking back to the data types videos, you'll recall maybe that these are lexemes, which are basically like word parts, word roots, pieces of the word. It stores the location of where those things show up.

You can see down here, "Hemlock Hoax, the Detective." We have 'detect,' 'hemlock,' and 'hoax.' It does look like it's alphabetized, but we don't have the word the, because the word the is not terribly useful in search. We kind of just poof. We leave that out. Now there's another type, we can do select to_tsquery. What do you wanna search for it? Everybody seems to love "Star Wars." We'll search for "Star Wars" and we get this also not terribly helpful as it stands alone, we get this result, which is just in single quotes, 'star.' What we're gonna do is we're gonna use these two things together and we will get something useful out of it. Let's open this up and we'll say from movies where, I'm just gonna paste this here, we're gonna pull this guy out of the select and bring it down to the where and how are we gonna compare these two? With this special operator, this special matching operator for the vector and query data types.

Now if we run that, hey, star, star, star, star, star. That's pretty good. We're already off to a very good start. Now the next thing we probably wanna do is say 'star wars.' If we say 'star wars,' shoot, we get an error. We get a syntax error in tsquery. That's because the tsquery format is very specific. It's like a domain specific language. It's a DSL unto itself. We do have to write it in such a way that it is parsable so that it can be turned into the right format to compare against that tsvector. There are some functions that can help us get there more easily, which we'll look at soon.

First I wanna show you some of the syntax of the underlying tsquery and then we'll move on to the helper functions after that. The reason that this failed is we didn't specify how these two words should relate to each other. Should they both be there? Does it need to be either one or the other? That's where these operators come in. We wanna say 'star & wars.' The word star and the word wars must be there. We can also say star or wars. We get stuff like "Gang War" and "Lucky Star." We can say that star must be immediately followed by a word like wars. If we run that, we get these 10 records back and they all have star followed immediately by wars, a lot of "Star Wars" movies.

Now we can do logical grouping like this. Here, it doesn't make a difference. It'll just be read straight through 'cause there's only one thing in there. But what you can do is you can say wars or trek. We're in a spot where we have star immediately followed by one of wars or trek. Running that, we see "Star Wars," Wars, "Star Trek," Trek, Trek, Trek, Trek, Wars, Wars, Trek, Trek, and so on and so forth. This right here is equivalent to saying the very next word i.e. one word must be wars or trek. This format with the dash in the middle is syntactic. Sugar for this one in the middle right here. If you were to say 2, that means skip the next word and then look for wars or trek and no such luck there.

However, if we do find one, let's look for generations. We can say generations. Star must be immediately followed by generations. No such luck. Must be two words. There you go. star blank generations. That is that little operator right there. We can also negate words. We can say, yes, I want "Star Trek," but I actually don't want "Star Trek Generations." You can do that with the negating operator. We're gonna say & trek & not generations. If we run that, we get all of the "Star Treks" back but not generations. If we wanted to remove khan as well, we could either say & not K-H-A-N and not khan and khan is gone. Or we might be able to group these and say, not generations or khan and generations and khan are now gone. Pardon the rhyme. You can see that constructing these is a little bit of a pain, very powerful.

You do end up having to potentially take a user query and then do this kind of stuff to it at the application layer before you hand it off to PostgreS. That is not always true. There are some helpers. I wanna show you ranking results real quick 'cause this is the last kind of base function and then we'll move on to making this a little bit more ergonomic in future videos. I've gone ahead and switched this back to & wars or trek, so we have a little bit more data to work with. I wanna add this ranking function. I can use ts_rank up here.

The first thing that we're gonna pass through is the vector. Whatever we're searching on down below, that's what we need to put into the ranking function up above. We can have this to_tsquery up here. That's the second argument. We'll just call this as rank. We can hop down here and say order by rank descending. If we look at that, we see... Okay, great, this is a little bit interesting. A lot of exactly the same, which we can tweak so we can tweak this ranking function and we will as we go on.

I just wanna show you this here first. In my estimation, the reason that these two get ranked so highly is this one includes the search term twice, Star and Wars and then Star and Wars. This one includes the search term once, but then it does include Wars again. My guess is that's why it's getting ranked almost double the one below it, which is Star and Trek, Star and Trek, Star and Wars, Star and Wars and so on and down we go. The highest rank is the most relevant based on the vector and the query that we have given it. We can tune this because once we expand out into searching the title and the plot, which we're gonna do soon, we kind of wanna rank the title a little bit higher.

Not only do we wanna rank the title a little bit higher, we kind of wanna maybe rank exact matches a little bit higher. We will do both of those things. But now we have a a relatively firm footing that we can start with. We have tsvectors, tsqueries, and tsranks. Now we're gonna start looking at fine tuning some of these things.