What is a Token Bucket
The token bucket is an algorithm for rate limiting. Each client has a bucket that holds tokens. Tokens are added at a fixed rate (e.g., 10 per second). Each API request consumes one token. If the bucket is empty, the request is rejected.
How it works
Two parameters define the behavior: bucket size (maximum tokens) and refill rate (tokens added per second). A bucket of size 100 with a refill rate of 10/second allows sustained traffic at 10 requests/second, with bursts up to 100 requests if the client has been idle and the bucket is full.
The algorithm is simple: on each request, calculate how many tokens have been added since the last request, add them (up to the bucket size), then subtract one. If the result is negative, reject the request.
Why it matters
Token bucket is the most popular rate limiting algorithm because it naturally handles bursty traffic. A client that sends requests at the sustained rate is always allowed through. A client that was idle can burst. A client that exceeds the rate is throttled. This matches real-world client behavior better than fixed-window algorithms.
See How Rate Limiting Works for token bucket, sliding window, and distributed enforcement.