The most comprehensive course on PostgreSQL

Hash indexes in Postgres are optimized for fast equality lookups, making them a strong choice for columns like URLs or email addresses where range queries and sorting aren't needed. You'll learn how to create and use hash indexes effectively, taking advantage of their consistent size and speed. While earlier versions of Postgres had reliability issues with hash indexes, those concerns have been resolved in version 10 and later, making them a practical option for targeted use cases.

I'm not saying that we are done with B-tree indexes, but we are gonna look at a different type of index, and that's the hash index. A hash index is only useful for strict equality lookups, that's it. For strict equality lookups, it's pretty useful. It can't be used for ordering or sorting. It can't be used for wild card searches. It can't be used for range searches. It can't be used for any of that. No partial matches whatsoever because the value itself that's being indexed is being run through a hashing function. The semantic meaning, the actual value is lost on the other side.

That's why you can't do a range function. That's why you can't do a partial match because what exists in the index is not the value, it is a hashed version of that value.

Postgres handles all of this for you, so you don't have to handle any of that. This is very similar to creating basically a B-tree index on an MD5 hash of a column, except it's not a B-tree, it is a fundamentally different data structure that is optimized for storing these hashes. It is going to be faster. Another benefit of a hash index is that the value is a constant size. You can run a massive piece of text or a long blob of any type through that, and it comes out the same size on the other side. Your index structure itself stays quite small and quite compact.

Let me show you how to use one. We're gonna create two indexes here, and we're gonna say, "create index email_btree on users". We're gonna go back to the long form, which is "using btree(email)". That is not necessary, but I do want it to stand opposed to "using hash(email). That is coming back. We started using this format, and then we switched to the shorthand format. You see why this longer format exists.

We're gonna create email_hash on users using the hash instead of the B-tree. We'll do "select * from users where email = aaron.francis@example". With any luck, this is going to show us that the hash one was chosen. The hash one was chosen because it is faster for strict equality lookups. You'll notice if we change this to less than, it uses the B-tree. If we change this to like, and we say, "aaron.francis" at any domain, it still uses, well it uses it, it actually doesn't use the B-tree at all. It decides it's gonna do a sequence scan on users, but it definitely doesn't use the hash because that would be illegal. If you need a strict equality lookup that is very, very fast, consider a hash index.

I will give you one word of warning, and that is prior to Postgres 10, they were dangerous and should never be used. Hopefully at this point, we're on Postgres 17, so hopefully, nobody's left on nine and below, fingers crossed. If you were on a version of Postgres prior to 10, there was this massive warning. Don't use hash indexes. Which like, they're in there, but definitely don't use them. That was a good warning because it could cause crashes. They weren't written into the write ahead log, so they weren't getting into replication. It was this whole mess. That has all been solved.

When you need strict equality on potentially, let's say potentially a very large column, you can imagine a URL. I think the legal limit for URL is many thousands of characters after you add all the UTM spam. If you want to do a strict equality fast lookup on that hashing, it might be a great option. You might also consider hashing email addresses for quick lookups or API tokens or something like that.

You will have to test out both methods and see which one works better for your data, and works, which one works better for your use case. Beause the hash is very restrictive in terms of where it can be used and it's just strict equalities, but you do a lot of strict equalities in your applications. Consider the hash index for those.

Full Course

PostgreSQL database platform

Summary

Video Transcript