Mastering Postgres
Introduction
Introduction to the course
Overview of course structure
Postgres vs. everyone
The psql CLI
Data Types
Introduction to schema
Integers
Numeric
Floating point
Storing money
NaNs and infinity
Casting types
Characters types
Check constraints
Domain types
Charsets and collations
Binary data
UUIDs
Boolean
Enums
Timestamps
Timezones
Dates and times
Advanced Data Types
Intervals
Serial type
Sequences
Identity
Network and mac addresses
JSON
Arrays
Generated columns
Text search types
Bit string
Ranges
Composite types
Nulls
Unique constraints
Exclusion constraints
Foreign key constraints
Indexing
Introduction to indexes
Heaps and CTIDs
B-Tree overview
Primary keys vs. secondary indexes
Primary key types
Where to add indexes
Index selectivity
Composite indexes
Composite range
Combining multiple indexes
Covering indexes
Partial indexes
Index ordering
Ordering nulls in indexes
Advanced Indexing
Functional indexes
Duplicate indexes
Hash indexes
Naming indexes
Understanding Query Plans
Introduction to explain
Explain structure
Scan nodes
Costs and rows
Explain analyze
Generating Results
Introduction to queries
Cross joins
Inner joins
Outer joins
Subqueries
Lateral joins
ROWS FROM
Filling gaps in sequences
Subquery elimination
Combining queries
Set generating functions
Indexing joins
Advanced SQL
Introduction to advanced SQL
Grouping
Grouping sets, rollups, cubes
Window functions
CTEs
CTEs with window functions
Recursive CTE
Hierarchical recursive CTE
Handling nulls
Row value syntax
Views
Materialized views
Removing duplicate rows
Upsert
Returning keyword
COALESCE + generated column
Full Text Search
Introduction to full text search
Searching with LIKE
Vectors, queries, and ranks
Websearch
Ranking
Indexing full text search
Highlighting
Advanced JSON
Intro to JSON
JSON vs JSONB
Validating JSON
Creating JSON objects + arrays
JSON extraction
JSON containment
JSON existence
JSON recordset
Updating JSON
Indexing JSON parts
GIN index
Vectors (pgvector)
Intro to pgvector
Vector embedding columns
Find related articles
Upsert vector embedding
Semantic search
Other operators
Vector indexes
Outro
Thank you
Bonus interviews
Heroku's glory days & Postgres vs the world (with Craig Kerstiens)
Creating a Postgres platform with Monica & Tudor from Xata.io
Bootstrapping an email service provider (with Jesse Hanley)
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced Indexing
Hash indexes

Full Course

$
349
$399
USD, one-time fee
I bought Mastering Postgres because I use Postgres a lot and wanted to support Aaron and his content. I already learned more than I expected and have been very impressed with the quantity and work Aaron put in.
Timo Strackfeldt
Timo Strackfeldt

PostgreSQL database platform

Shorten dev cycles with branching and zero-downtime schema migrations.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Xata

Summary

Hash indexes in Postgres are optimized for fast equality lookups, making them a strong choice for columns like URLs or email addresses where range queries and sorting aren't needed. You'll learn how to create and use hash indexes effectively, taking advantage of their consistent size and speed. While earlier versions of Postgres had reliability issues with hash indexes, those concerns have been resolved in version 10 and later, making them a practical option for targeted use cases.

Video Transcript

I'm not saying that we are done with B-tree indexes, but we are gonna look at a different type of index, and that's the hash index. A hash index is only useful for strict equality lookups, that's it. For strict equality lookups, it's pretty useful. It can't be used for ordering or sorting. It can't be used for wild card searches. It can't be used for range searches. It can't be used for any of that. No partial matches whatsoever because the value itself that's being indexed is being run through a hashing function. The semantic meaning, the actual value is lost on the other side.

That's why you can't do a range function. That's why you can't do a partial match because what exists in the index is not the value, it is a hashed version of that value.

Postgres handles all of this for you, so you don't have to handle any of that. This is very similar to creating basically a B-tree index on an MD5 hash of a column, except it's not a B-tree, it is a fundamentally different data structure that is optimized for storing these hashes. It is going to be faster. Another benefit of a hash index is that the value is a constant size. You can run a massive piece of text or a long blob of any type through that, and it comes out the same size on the other side. Your index structure itself stays quite small and quite compact.

Let me show you how to use one. We're gonna create two indexes here, and we're gonna say, "create index email_btree on users". We're gonna go back to the long form, which is "using btree(email)". That is not necessary, but I do want it to stand opposed to "using hash(email). That is coming back. We started using this format, and then we switched to the shorthand format. You see why this longer format exists.

We're gonna create email_hash on users using the hash instead of the B-tree. We'll do "select * from users where email = aaron.francis@example". With any luck, this is going to show us that the hash one was chosen. The hash one was chosen because it is faster for strict equality lookups. You'll notice if we change this to less than, it uses the B-tree. If we change this to like, and we say, "aaron.francis" at any domain, it still uses, well it uses it, it actually doesn't use the B-tree at all. It decides it's gonna do a sequence scan on users, but it definitely doesn't use the hash because that would be illegal. If you need a strict equality lookup that is very, very fast, consider a hash index.

I will give you one word of warning, and that is prior to Postgres 10, they were dangerous and should never be used. Hopefully at this point, we're on Postgres 17, so hopefully, nobody's left on nine and below, fingers crossed. If you were on a version of Postgres prior to 10, there was this massive warning. Don't use hash indexes. Which like, they're in there, but definitely don't use them. That was a good warning because it could cause crashes. They weren't written into the write ahead log, so they weren't getting into replication. It was this whole mess. That has all been solved.

When you need strict equality on potentially, let's say potentially a very large column, you can imagine a URL. I think the legal limit for URL is many thousands of characters after you add all the UTM spam. If you want to do a strict equality fast lookup on that hashing, it might be a great option. You might also consider hashing email addresses for quick lookups or API tokens or something like that.

You will have to test out both methods and see which one works better for your data, and works, which one works better for your use case. Beause the hash is very restrictive in terms of where it can be used and it's just strict equalities, but you do a lot of strict equalities in your applications. Consider the hash index for those.