What is Rate Limiting
Rate limiting restricts how many requests a client can make to an API within a given time window. Exceed the limit and you get 429 Too Many Requests. Wait for the window to reset and try again.
How it works
The server tracks request counts per client (identified by API key, user ID, or IP address). When a request arrives, the server checks the count against the limit. If under the limit, the request proceeds and the count increments. If at the limit, the server returns 429 with a Retry-After header indicating when the client can try again.
Response headers communicate the current state: X-RateLimit-Limit (maximum allowed), X-RateLimit-Remaining (requests left), and X-RateLimit-Reset (when the window resets).
Why it matters
Without rate limiting, a single client — buggy, malicious, or simply aggressive — can consume all server resources, degrading service for everyone. Rate limiting ensures fairness (each client gets a share), protection (no single client can overwhelm the system), and cost control (each request costs compute and bandwidth). It's a fundamental requirement for any API exposed to external clients.
See How Rate Limiting Works for token bucket, sliding window, and distributed rate limiting.