What is a Hash Function
A hash function takes an input of any size — a string, a number, a file — and produces a fixed-size integer called a hash code. In the context of hash maps, this integer determines which bucket a key-value pair is stored in.
How it works
A hash function processes the bytes of its input through a series of mathematical operations (multiplication, XOR, bit shifting) to produce an integer. The same input always produces the same output — this is called determinism and is essential for retrieval to work.
hash("alice") → 4829173
hash("bob") → 7391042
hash("alice") → 4829173 (always the same)
A good hash function has three properties:
- Deterministic — the same key always produces the same hash.
- Uniform distribution — different keys should spread evenly across the output range. If most keys hash to the same few values, collisions pile up and performance degrades from O(1) to O(n).
- Fast to compute — hashing happens on every insert and every lookup, so it must be cheap.
The hash code is typically larger than the number of buckets, so it gets reduced: bucket = hash_code % array_length. This is why even a perfect hash function still produces collisions — many possible hash codes map to the same bucket index.
Cryptographic hash functions (SHA-256, BLAKE3) are a different category. They add properties like collision resistance (infeasible to find two inputs with the same hash) and irreversibility (can't recover the input from the hash). These are used for security — passwords, digital signatures, integrity checks — not for hash map indexing. Hash map hash functions prioritize speed over security.
Why it matters
The hash function is the mechanism that makes O(1) lookup possible. A poor hash function that clusters keys into the same buckets turns a hash map into a slow linked list. A good one distributes keys evenly and keeps every bucket short.
See How Hash Maps Work for how hash functions, arrays, and collision resolution combine into the complete data structure.