How Caching Works — Speed vs Freshness at Every Layer

2026-03-24

A database query takes 50 ms. Running it once is fine. Running it 10,000 times per second for the same data is 500 seconds of CPU time doing the same work. Caching stores the result the first time and serves it from memory for subsequent requests — turning a 50 ms query into a sub-millisecond lookup.

Caching is the simplest way to make a system faster. It is also one of the hardest things to get right, because every cache creates the same fundamental problem: the cached copy can become stale.

Cache Layers

A request passes through multiple layers, each of which can cache:

Browser cache — the user's browser stores responses locally. Controlled by HTTP headers (Cache-Control, ETag, Last-Modified). A cached stylesheet loads instantly — no network request at all. TTL is typically minutes to days.

CDN (Content Delivery Network) — edge servers distributed globally. When a user in Tokyo requests an image, the CDN serves it from a server in Tokyo instead of routing to an origin server in Virginia. Reduces latency from hundreds of milliseconds to single digits. Cloudflare, CloudFront, and Fastly operate CDNs.

Reverse proxy — nginx, Varnish, or a cloud load balancer caches responses from the application. Multiple users requesting the same page hit the proxy cache instead of the application server. Sits between the internet and your application.

Application cache — the application caches data in memory (in-process) or in a shared cache like Redis or Memcached. A REST endpoint caches database results in Redis with a 60-second TTL. All application instances share the cache.

Database cache — the database itself caches pages in a buffer pool. PostgreSQL's shared_buffers keeps hot pages in memory. The query planner caches plans. This is automatic and mostly invisible to the application.

Request flow through cache layers

User Browser cache CDN edge cache Reverse proxy App Server Redis cache Database buffer pool

miss ↓ miss ↓ miss ↓ origin

A cache hit at any layer avoids everything below it

Cache Patterns

Cache-aside (lazy loading) — the most common pattern. The application checks the cache. On a miss, it queries the database, stores the result in the cache, and returns it. On a hit, it returns the cached value directly. The application manages both the cache and the database.

result = cache.get(key)
if result is None:
    result = db.query(key)
    cache.set(key, result, ttl=60)
return result

Read-through — the cache sits in front of the database. The application always reads from the cache. On a miss, the cache itself queries the database, stores the result, and returns it. The application never talks to the database directly. Simpler application code, but the cache must understand how to fetch from the origin.

Write-through — writes go to the cache first, and the cache synchronously writes to the database. The cache is always up to date, but writes are slower because they wait for both the cache and the database.

Write-behind (write-back) — writes go to the cache, and the cache asynchronously writes to the database later. Writes are fast, but data can be lost if the cache fails before flushing to the database. Used when write performance is critical and occasional data loss is acceptable.

Cache Invalidation

There are two hard problems in computer science: naming things, cache invalidation, and off-by-one errors.

Cache invalidation is the process of removing or updating stale data from the cache. Get it wrong, and users see outdated information. Get it right, and the cache is effectively invisible.

TTL (Time To Live) — the simplest strategy. Every cached entry has an expiration time. After 60 seconds, the entry is evicted and the next request fetches fresh data. Simple, predictable, but the data is stale for up to TTL seconds.

Event-based invalidation — when the data changes, the service that owns the data publishes an event, and the cache subscriber invalidates the relevant keys. More complex, but the cache is stale only for the time it takes the event to propagate (typically milliseconds). This pairs naturally with event-driven architecture.

Manual purge — an admin or deployment script explicitly clears the cache. Used after deployments when the data format changes, or when fixing a bug that caused bad data to be cached.

Versioned keys — include a version number in the cache key (user:42:v3). When the data schema changes, increment the version. Old entries age out naturally. No explicit invalidation needed.

Cache Stampede

When a popular cache entry expires, hundreds of concurrent requests may all miss the cache and hit the database simultaneously. This is a cache stampede (or thundering herd).

Mitigations:

Locking — the first request acquires a lock, fetches the data, and populates the cache. Other requests wait for the lock and then read from the cache.
Early refresh — refresh the cache entry before it expires. A background job or a request that sees the entry is "almost expired" triggers a refresh while still serving the stale value.
Jitter — add randomness to TTL values so entries don't all expire at the same time.

Implementations

Redis — in-memory key-value store. Sub-millisecond reads. Supports strings, hashes, lists, sets, and sorted sets. TTL per key. Persistence options (RDB snapshots, AOF log). The standard choice for application caching.

Memcached — simpler than Redis. Key-value only. No data structures, no persistence. Slightly faster for pure caching workloads. Used when you need a distributed memory cache and nothing else.

CDN (Cloudflare, CloudFront, Fastly) — cache HTTP responses at the edge. Configured via HTTP headers or CDN-specific rules. Best for static assets, but increasingly used for dynamic content with short TTLs.

HTTP Cache Headers — Cache-Control: max-age=3600 tells the browser and intermediate caches to store the response for an hour. ETag enables conditional requests — the client sends If-None-Match, and the server returns 304 Not Modified if the content hasn't changed. Zero bytes transferred.

When Caching Hurts

Caching is not free. A cache that is never hit wastes memory. A cache with the wrong TTL serves stale data. A cache that is not invalidated correctly causes bugs that are extremely hard to reproduce ("it works if I clear the cache").

Do not cache:

Data that changes on every request (session nonces, CSRF tokens).
Responses that vary per user unless you include the user in the cache key.
Write-heavy data where the invalidation cost exceeds the cache benefit.

Next Steps

How Load Balancing Works — distributing traffic across servers, often combined with caching.
How HTTP Works — understanding the cache headers that control browser and CDN behavior.
How DNS Works — DNS caching is one of the most impactful caches on the internet.

Prerequisites

How Load Balancing Works