Introduction to full text search

Learn how to use Postgres for full-text search, which is great for most use cases before considering more advanced tools like Elasticsearch or Typesense. The video advises leveraging Postgres full-text search until your needs grow, especially if you need features like typo tolerance that Postgres can't handle. We explore working with movie data using `tsvectors` and GIN indexes to efficiently search and rank movie information.

In this module, we're gonna cover default vanilla out of the box, full text search with Postgres. And I know what you're saying, 'Aaron, Elasticsearch exists, Typesense exists, MeiliSearch exists.' I agree. And all of those things are great. And in fact, if you're using a platform like Xata, then it automatically replicates all your data out to Elasticsearch and gives you that incredibly powerful search. You can absolutely do that.

My encouragement to you is going to be, push the Postgres full-text search until it doesn't work for you anymore. The place where it'll start to break down is either scale, you just have too much text that you need to search over or features/functionality. You might end up in a spot where you need something like typo tolerance and faceted search, and a lot of stuff that Postgres maybe doesn't support out of the box. Even if you can like finagle Postgres to do it, it might be better just to go with one of the tools that is dedicated to search. Because Postgres, by default is not dedicated to search. There are tools that are better for it. It's just a matter of, do you actually need those tools.

If you're at a somewhat reasonable scale and the full-text search works for you, that greatly simplifies your life and your stack. If you're not on something like Xata that handles all of that for you, then just staying in vanilla Postgres is gonna simplify your stack.

We're gonna cover tsvectors, tsqueries, rankings, GIN indexes, all of that kind of stuff. First, let me show you the data that we're working with, so in the next video we can jump right in. What we're working with is select * from movies, and this dataset came from Kaggle. I downloaded it as a CSV and just imported it. I will leave a link down below the video somewhere so you can grab this dataset for yourself. I have literally just imported it into this table and have done nothing else yet.

Let's take a look at this. I do want to see some movies that I recognize. We're gonna sort by release year here and I don't recognize any of those. 'The Lego Batman'. I recognize that one. Return of Xander Cage. Okay so, sorry, I got distracted. Here is this schema that we're working with, release year, title, origin director, the whole cast, genre, wiki, and then the plot.

I think what we're probably gonna end up searching over is the title and the plot. We're gonna try to search for some keywords. We're gonna try to rank them, maybe we'll rank them where newer movies come up higher because they're more likely to be known than the 'Kansas Saloon Smashers' from 1901, although that does sound interesting. This is the dataset that we're going to work with. Let's get started.

Full Course

PostgreSQL database platform

Summary

Video Transcript