Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced Data Types
Text search types

Full Course

$
349
$399
USD, one-time fee
The course "Mastering Postgres" helped me practice query commands and understand why they're important. Inspiring me to explore the 'why' and 'how' behind each one. Truly a great course!
Nakolus
Nakolus

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Learn how using full-text search directly in Postgres can simplify your tech stack by eliminating the need for additional search tools like Elasticsearch. We'll look at how to use the `tsvector` and `tsquery` data types for creating and searching text, transforming terms into lexemes for more accurate matches. It also covers how to keep search vectors updated efficiently using generated columns in Postgres, ensuring your data remains synchronized.

Video Transcript

Having full text search right in your database can be a wonderful, wonderful thing because your stack remains very simple. You don't have to pull in elastic search or any other like Meilisearch type sense, anything like that, if you can run it all inside of Postgres. It is about a maybe 80 or 90% solution, so it covers most people's needs. When you get to the upper end of maybe something like faceted search or something like that, highlighting results, that kind of thing, we might need to reach outside of Vanilla Postgres.

If you're using one of the hosted providers like Xata, they do offer a way to replicate your Postgres out to Elasticsearch, and you can check the docs on that. That's a pretty awesome solution as well. What we're gonna do here is I'm gonna show you the two types that can aid you in full text search, and then later on we'll have a full module on kind of configuring some of this, searching against it, stop words, booleans, all of that sort of stuff. For now, let's look at these two data types.

The first data type that I wanna show you here is a tsvector. We're gonna say to_tsvector, and then we're gonna put our dear old friend quick brown Fox in there. And if you look down here, we get this little bit of a wonky format, and I'll explain that to the best of my abilities here in a second. But let's just check pg_typeof, and you see that the type is a tsvector. That is the proper type that comes out of to_tsvector. Now a tsvector is a sorted list of distinct lexemes. What does any of that mean? A sorted list of distinct lexemes. It is sorted. That part we can kind of see B-D-F-J-L-Q. That part is easy. Distinct, that part is easy. If we were to put fox in here again, you'll see fox shows up once, but the position or the numbers change to four and five. Now a lexeme is something that I have historically not known what it is, but it is the, it is a basic unit of language. What we see here is it's like a, it's like an atomic unit of language, like a linguistic part almost. What we see here is lazy has been turned into L-A-Z-I. If we were to add laziness, you would see that lazy has kind of just retained, it's just, it's both. We have lazy and laziness that are kind of being merged together into one single lexeme. You don't really need to know all that. I think it's a little bit interesting, but the point is this to_tsvector will take a chunk of text called a document, that's the actual argument name.

It takes a document and turns it into this vector format, which we can then use to search against using a different data type. That second data type, it is a tsquery. If we said to_tsquery, and let's just for lazy, and if we ran that, you'll see we do get the lexeme-ified, lexemed version of the word lazy. If we did a pg_typeof, you'll see that this is a proper type of tsquery. The way that we can search, the way that we can search is we can take our search string and turn it into a tsquery and then apply that ts query against a tsvector, and we can use the @@ operator. If we do that, you'll see we get back true. If we do brown, we get back true. If we do red, we get back false. If we do laziness like we looked at earlier, those lexemed versions do match 'cause they get lexemed. I don't know if that's real, but I like saying it. They get lexemed down to L-A-Z-I. This does provide, oh, I already have, I still have laziness in there. Whoops, there you go, still true. That all still, that all still applies because they both get lexemed down to L-A-Z-I.

You can see you do get a little bit of fuzziness just straight outta the box. Laziness will match lazy, and the other way around. It doesn't matter what order you put these in. You could do it that way as well, and it will still come back as true. There are so many more operators. There's ands and ors and nots and weights and rankings and all kinds of stuff. We will save that for later. I do wanna show you one interesting idea. Now that we've learned about vectors and generated columns, I wanna show you an interesting idea for keeping your vectors up to date with the root text or the root document.

Here's a pretty basic table. We've got an ID up there, content as text, and search vector EN as a tsvector for English content. Interestingly we can do select to_tsvector, and you can pass in a different language here. We could say French, and oui. That's the only French word I know, and that will work just fine. By default, it is going to use the default text search config, which I think in many cases will be set to English. But if you want to specify it, you can of course pass in English there, which would likely mean you'd be saying yes. Now this is gonna come back into play. Here we go. Let's say that we've got content as text and a search vector E-N as a tsvector. That's a good start. However, these two things can get out of sync.

They are discrete columns that could be updated independently. You could put a trigger on this table, but what did we just learn about that Aaron loves so much? You guessed it, generated columns. Let's say generated always as open parentheses stored because Postgres does not implement virtual. We have to do stored, which is fine. Generated always as to_tsvector, and then we're gonna pass in the content. Now watch this. Wah, wah, doesn't work. Generation, yeah, that's weird. Generation expression is not immutable. Remember when I said that these have to be, I didn't say they have to be immutable. I said they have to be deterministic. The values cannot change from run to run. Now why would this change?

Potentially, potentially you could change your default text search config. Instead we can just throw English in here, and it becomes immutable. Now that is a deterministic generated column. If we did insert into, insert into ts_example, just the content, we only need to insert the content now, values of, and we'll just put, lemme see, I'll just copy this quick brown fox. The quick brown fox jumps over the lazy dog. We'll input that again. And that looks good. We'll run that, and then select * from ts_example.

We have our vectorized version over here. We could say where search_vector_en @@ to_tsquery, and let's search for lazy again. There we go. If we did this again without lazy, we'll just say over the cat instead, and we were to search for lazy, we only get the first one back. If we search for cat, we only get the second one back. Like I said, we'll dive more into full text search as we get further along. I couldn't resist showing you another generated column, but now you understand the tsvector and the tsquery data types, and that will come back into play when we get to the full text search module.