How Hash Maps Work — O(1) Lookup From First Principles

How Hash Maps Work — O(1) Lookup From First Principles

2026-03-22

A hash map (also called hash table, dictionary, or associative array) stores key-value pairs and finds any value by its key in O(1) average time. Not O(log n). Not O(n). Constant time, regardless of whether the map has 10 entries or 10 million.

This sounds impossible — how can you find something without looking through the data? The trick is a hash function that converts the key directly into an array index.

The Core Idea

A hash map is an array where the index is computed from the key:

  1. Hash the key — run the key through a hash function to get an integer.
  2. Compute the indexindex = hash(key) % array_size.
  3. Store or retrieve — use that index to access the array directly.
hash("name") = 7892341
index = 7892341 % 8 = 5
array[5] = ("name", "Alice")

To look up "name" later, compute the same hash, get the same index, and read the array at that position. One hash computation, one array access. O(1).

What Is a Hash Function?

A hash function takes an input of any size and produces a fixed-size integer. Good hash functions have three properties:

Deterministic — the same input always produces the same output. hash("name") must always return the same number.

Uniform distribution — outputs should spread evenly across the output range. If most keys hash to the same few values, the map degrades to a linked list.

Fast — hashing should be cheaper than the work it saves. Common hash functions (SipHash, FNV, xxHash) process data at gigabytes per second.

A hash function is NOT encryption. It doesn't need to be reversible or cryptographically secure for hash map purposes (though cryptographic hashes like SHA-256 are used elsewhere). It just needs to spread keys evenly and quickly.

What Happens When Two Keys Hash to the Same Index?

This is called a collision. With a finite array and potentially infinite keys, collisions are inevitable. There are two main strategies:

Chaining

Each array slot holds a linked list (or small array) of all entries that hash to that index. On collision, the new entry is appended to the list.

Lookup: hash the key, go to the array slot, scan the list for the matching key. If the hash function distributes well, each list has ≈ 1 entry, so lookup is still O(1) average. If the hash function is bad and everything lands in one slot, it degrades to O(n).

Open Addressing

On collision, probe the next slot: if slot 5 is taken, try 6, then 7, and so on (linear probing). Or try slots at quadratic intervals (quadratic probing). Or hash again with a different function (double hashing).

Open addressing keeps everything in the array — no linked lists, better cache performance. But it's sensitive to the load factor (entries / array_size). As the array fills past ~70%, clusters of occupied slots form and lookups slow down.

What Is the Load Factor?

The load factor = number of entries / array size. It measures how full the hash map is.

Load factorExpected probe lengthPerformance
0.25~1.2Excellent
0.50~1.5Good
0.75~2.5Acceptable
0.90~5.5Degrading
0.99~50Terrible

When the load factor exceeds a threshold (typically 0.75), the hash map resizes: allocates a new array (usually 2x the size), rehashes every entry, and inserts them into the new array. This is O(n) but happens infrequently — amortized over many inserts, each insert is still O(1).

What Can Be a Key?

Any value that can be hashed and compared for equality. In practice:

  • Strings — the most common key type. Hash the bytes.
  • Integers — hash the integer value (sometimes the identity function works).
  • Composite keys — hash a combination of fields (e.g., (user_id, project_id)).

Keys must be immutable while in the map. If you modify a key after insertion, its hash changes and the map can no longer find it. This is why Python requires dictionary keys to be hashable (immutable types like strings, numbers, tuples) and why Java's HashMap contract requires that equals() and hashCode() are consistent.

Where Do Hash Maps Appear?

Hash maps are everywhere:

Language runtimes — Python's dict, JavaScript's objects, Ruby's Hash, Rust's HashMap, Go's map, Java's HashMap. Most dynamic languages are built on hash maps internally — object attribute lookup is a hash map operation.

Inverted indexes — a search engine's index maps each term to a list of documents. The term lookup is a hash map (or sorted array with binary search).

Databases — hash indexes provide O(1) lookup for exact-match queries. Different from B-tree indexes, which support range queries.

Caching — LRU caches use a hash map for O(1) lookup and a doubly-linked list for O(1) eviction. Memcached and Redis are essentially distributed hash maps.

DNS resolution — your operating system's DNS cache is a hash map from domain names to IP addresses.

Deduplication — checking if you've seen an item before: if (item in seen_set). A set is a hash map with only keys, no values.

Counting — frequency counting, histograms, word counting. counts[word] += 1 is a hash map operation.

Hash Maps vs Trees

Hash maps and trees (covered in the next lesson) both store key-value pairs, but with different tradeoffs:

Hash MapTree (BST/B-tree)
LookupO(1) averageO(log n)
InsertO(1) averageO(log n)
Range queryNot supportedO(log n + k)
Ordered iterationNoYes
Worst caseO(n) with bad hashO(log n) balanced
MemoryArray + overheadNodes + pointers

Use hash maps when you need fast exact-match lookup. Use trees when you need ordered data or range queries. Databases support both: hash indexes for WHERE id = 5 and B-tree indexes for WHERE age BETWEEN 18 AND 30.

Next Steps