Aaron is a natural teacher and this course is the best introduction to Postgres I have come across. Lessons are easy to follow and he recommends some great tools for working with Postgres.Joel Drumgoole
Shorten dev cycles with branching and zero-downtime schema migrations.
I'm not saying that we are done with B-tree indexes, but we are gonna look at a different type of index, and that's the hash index. A hash index is only useful for strict equality lookups, that's it. For strict equality lookups, it's pretty useful. It can't be used for ordering or sorting. It can't be used for wild card searches. It can't be used for range searches. It can't be used for any of that. No partial matches whatsoever because the value itself that's being indexed is being run through a hashing function. The semantic meaning, the actual value is lost on the other side.
That's why you can't do a range function. That's why you can't do a partial match because what exists in the index is not the value, it is a hashed version of that value.
Postgres handles all of this for you, so you don't have to handle any of that. This is very similar to creating basically a B-tree index on an MD5 hash of a column, except it's not a B-tree, it is a fundamentally different data structure that is optimized for storing these hashes. It is going to be faster. Another benefit of a hash index is that the value is a constant size. You can run a massive piece of text or a long blob of any type through that, and it comes out the same size on the other side. Your index structure itself stays quite small and quite compact.
Let me show you how to use one. We're gonna create two indexes here, and we're gonna say, "create index email_btree on users". We're gonna go back to the long form, which is "using btree(email)". That is not necessary, but I do want it to stand opposed to "using hash(email). That is coming back. We started using this format, and then we switched to the shorthand format. You see why this longer format exists.
We're gonna create email_hash on users using the hash instead of the B-tree. We'll do "select * from users where email = aaron.francis@example". With any luck, this is going to show us that the hash one was chosen. The hash one was chosen because it is faster for strict equality lookups. You'll notice if we change this to less than, it uses the B-tree. If we change this to like, and we say, "aaron.francis" at any domain, it still uses, well it uses it, it actually doesn't use the B-tree at all. It decides it's gonna do a sequence scan on users, but it definitely doesn't use the hash because that would be illegal. If you need a strict equality lookup that is very, very fast, consider a hash index.
I will give you one word of warning, and that is prior to Postgres 10, they were dangerous and should never be used. Hopefully at this point, we're on Postgres 17, so hopefully, nobody's left on nine and below, fingers crossed. If you were on a version of Postgres prior to 10, there was this massive warning. Don't use hash indexes. Which like, they're in there, but definitely don't use them. That was a good warning because it could cause crashes. They weren't written into the write ahead log, so they weren't getting into replication. It was this whole mess. That has all been solved.
When you need strict equality on potentially, let's say potentially a very large column, you can imagine a URL. I think the legal limit for URL is many thousands of characters after you add all the UTM spam. If you want to do a strict equality fast lookup on that hashing, it might be a great option. You might also consider hashing email addresses for quick lookups or API tokens or something like that.
You will have to test out both methods and see which one works better for your data, and works, which one works better for your use case. Beause the hash is very restrictive in terms of where it can be used and it's just strict equalities, but you do a lot of strict equalities in your applications. Consider the hash index for those.