
How Hashing Works — One-Way Functions and Digital Fingerprints
A hash function takes an input of any size — a password, a file, an entire database — and produces a fixed-size output called a hash (or digest, fingerprint, checksum). The same input always produces the same hash. Different inputs produce different hashes. And critically: you cannot reverse the process. Given a hash, you cannot recover the original input.
This one-way property is what makes hashing useful for security. You can verify data without storing or transmitting the original.
What Makes a Cryptographic Hash Function?
Not all hash functions are cryptographic. The hash function in a hash map only needs to distribute keys evenly. A cryptographic hash function needs three additional properties:
Pre-image resistance — given a hash h, it's computationally infeasible to find any input m where hash(m) = h. You can't reverse the function.
Second pre-image resistance — given an input m1, it's infeasible to find a different input m2 where hash(m1) = hash(m2). You can't find another input that produces the same hash.
Collision resistance — it's infeasible to find any two different inputs that produce the same hash. Not just hard to find a collision for a specific input — hard to find any collision at all.
What Does a Hash Look Like?
SHA-256 (the most widely used cryptographic hash) produces a 256-bit output — 32 bytes, typically written as 64 hexadecimal characters:
SHA-256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
SHA-256("hello!") = ce06092fb948d9ffac7d1a376f404199d2b8f55ca68f4fc6e1a40e3b27e56ba2
SHA-256("hello!!") = b74f5ee9e1e67950cc2d7dbb59feed34e3717d0b5c8e6449e3b4c0e17f6b0c4a
Notice: changing one character completely changes the output. There's no pattern — "hello" and "hello!" have completely unrelated hashes. This is the avalanche effect: a tiny change in input produces a dramatically different output.
The output is always 256 bits, whether the input is 5 bytes or 5 gigabytes.
Where Is Hashing Used?
Password storage — never store passwords in plaintext. Store hash(password). When the user logs in, hash what they typed and compare. If the hashes match, the password is correct. If the database is stolen, the attacker gets hashes, not passwords.
But simple hashing isn't enough. Attackers precompute hashes of common passwords (rainbow tables). The defense: salting — prepend a random string (the salt) to each password before hashing. hash(salt + password). Each user gets a unique salt, so precomputed tables are useless.
Modern password hashing uses deliberately slow algorithms — bcrypt, scrypt, or Argon2 — that take 100ms+ per hash. Fast for a user logging in once. Impossibly slow for an attacker trying billions of guesses.
Git — every commit, tree, and blob in git is identified by its SHA-1 hash. git commit hashes the content, and the hash becomes the commit ID. This means: if two commits have the same hash, they have the same content. If you modify history, all downstream hashes change. This is how git detects corruption and tampering.
File integrity — download a file and check its hash against the published hash. If they match, the file wasn't corrupted or tampered with during transfer. Package managers (apt, brew, cargo) verify downloads this way.
Blockchains — each block contains the hash of the previous block. Changing any block changes its hash, which breaks the chain. This is how blockchains achieve immutability.
Data deduplication — hash each chunk of data. If two chunks have the same hash, they're identical. Store only one copy. Used by backup systems, content-addressable storage, and CDNs.
Hash maps — non-cryptographic hash functions power the O(1) lookup in hash maps. The hash converts a key to an array index. Different use case, same fundamental idea.
Which Hash Function Should You Use?
| Function | Output size | Speed | Use case |
|---|---|---|---|
| SHA-256 | 256 bits | Fast | General integrity, certificates, signatures |
| SHA-384/512 | 384/512 bits | Fast | Higher security margin |
| SHA-1 | 160 bits | Fast | Legacy only (git). Collision attacks exist. Don't use for security. |
| MD5 | 128 bits | Very fast | Legacy only. Broken. Collisions are trivial. |
| BLAKE3 | 256 bits | Very fast | Modern alternative to SHA-256, parallelizable |
| bcrypt | 184 bits | Deliberately slow | Password hashing |
| Argon2 | Configurable | Deliberately slow | Password hashing (winner of PHC) |
For general purposes: SHA-256. For passwords: Argon2 or bcrypt. Never MD5 or SHA-1 for anything security-related.
What Is a Hash Collision?
A collision occurs when two different inputs produce the same hash. Since hash outputs are fixed-size (256 bits for SHA-256) and inputs are unlimited, collisions must exist mathematically — there are more possible inputs than possible outputs.
For SHA-256, finding a collision requires approximately 2^128 operations (the birthday attack). At 10 billion hashes per second, this would take 10^21 years. This is why SHA-256 is considered collision-resistant — not because collisions don't exist, but because finding one is computationally impossible with current technology.
MD5 and SHA-1 don't have this property anymore. Researchers have found practical collisions — two different files with the same hash. This is why they're deprecated for security.
Next Steps
Hashing verifies integrity — but doesn't hide data. For that, you need encryption:
- How Symmetric Encryption Works — encrypting data with a shared key.
- How Hash Maps Work — non-cryptographic hashing for O(1) lookup.
- How Certificates Work — how hashes are used in the certificate chain of trust.
References
Referenced by
- How Partitioning Works — Splitting Data Across Nodes
- How Key Exchange Works — Sharing Secrets Over Insecure Channels
- How Digital Signatures Work — Proving Authorship and Integrity
- Cryptography FAQ
- How Symmetric Encryption Works — One Key, Two Operations
- What is a Salt
- What is HMAC
- What is bcrypt
- What is SHA-256
- What is a Hash Function
- TLS 1.3 Handshake — A Visual Walkthrough