What is Rate Limiting

Rate limiting restricts how many requests a client can make to an API within a given time window. Exceed the limit and you get 429 Too Many Requests. Wait for the window to reset and try again.

How it works

The server tracks request counts per client (identified by API key, user ID, or IP address). When a request arrives, the server checks the count against the limit. If under the limit, the request proceeds and the count increments. If at the limit, the server returns 429 with a Retry-After header indicating when the client can try again.

Response headers communicate the current state: X-RateLimit-Limit (maximum allowed), X-RateLimit-Remaining (requests left), and X-RateLimit-Reset (when the window resets).

Why it matters

Without rate limiting, a single client — buggy, malicious, or simply aggressive — can consume all server resources, degrading service for everyone. Rate limiting ensures fairness (each client gets a share), protection (no single client can overwhelm the system), and cost control (each request costs compute and bandwidth). It's a fundamental requirement for any API exposed to external clients.

See How Rate Limiting Works for token bucket, sliding window, and distributed rate limiting.

This concept appears in

How Rate Limiting Works — Protecting APIs from Overload

Referenced by

APIs FAQ

What is Rate Limiting

How it works

Why it matters

Related

This concept appears in

Referenced by