
How Message Queues Work — Buffering Work Between Services
A user uploads a video. Transcoding that video takes minutes. Do you make the user wait? No. You drop a message on a queue — "transcode video #4827" — and return immediately. A worker picks up the message, transcodes the video, and updates the status when finished.
A message queue is a buffer between a producer (the component that creates work) and a consumer (the component that does the work). The producer writes messages to the queue. The consumer reads messages from the queue and processes them. The queue decouples the two in time — the producer does not wait for the consumer.
FIFO Ordering
Most queues aim for FIFO (First In, First Out) — messages are consumed in the order they were produced. The first message enqueued is the first message dequeued.
In practice, strict FIFO is hard to guarantee in distributed systems. Amazon SQS has a standard queue (best-effort ordering) and a FIFO queue (strict ordering, lower throughput). Kafka guarantees ordering within a partition. RabbitMQ guarantees ordering within a single queue with a single consumer.
When ordering matters (processing bank transactions for a single account), use a FIFO queue or partition by entity key. When ordering doesn't matter (sending notification emails), a standard queue with higher throughput is the better choice.
Acknowledgments
What happens if a worker crashes halfway through processing a message? The message would be lost if the broker deleted it on delivery. To prevent this, queues use acknowledgments (acks).
The flow:
- The worker pulls a message from the queue.
- The message becomes invisible to other workers (the visibility timeout).
- The worker processes the message.
- The worker sends an acknowledgment to the broker.
- The broker deletes the message.
If the worker crashes before acknowledging, the visibility timeout expires and the message reappears in the queue for another worker to pick up. This guarantees at-least-once delivery — the message is processed at least once, possibly more than once if the worker crashes after completing the work but before sending the ack.
This is why consumers must be idempotent — processing the same message twice must produce the same result. Use idempotency keys, deduplication tables, or database constraints to handle duplicates.
Competing Consumers
A single queue can have multiple workers consuming from it. Each message goes to exactly one worker. This is the competing consumers pattern — workers compete for messages.
This is how you scale processing. If one worker handles 100 messages per second and you need 500 per second, run five workers. The queue distributes messages among them automatically. No code changes required.
This is the key difference from pub/sub. In pub/sub, every subscriber gets every message (fan-out). In a queue, each message goes to one consumer (load distribution).
Dead Letter Queues
Some messages cannot be processed. The payload is malformed. The downstream dependency is permanently unavailable. The processing logic has a bug. If the worker rejects the message and it returns to the queue, it will be retried forever in an infinite loop.
A dead letter queue (DLQ) catches these messages. After a configurable number of delivery attempts (e.g., 3 retries), the broker moves the message to the DLQ instead of requeueing it. The DLQ holds poison messages for inspection. An engineer looks at the DLQ, figures out what went wrong, fixes the issue, and replays the messages.
Every production queue needs a DLQ. Without one, a single bad message blocks the queue or spins in a retry loop consuming resources.
Backpressure
When producers generate messages faster than consumers can process them, the queue grows. This is backpressure — the system is under more load than it can handle.
Queues absorb temporary bursts. A spike of 1000 messages per second into a queue consumed at 500 per second causes the queue to grow for a while, then drain when the spike ends. This is healthy — the queue acts as a shock absorber.
Sustained overload is different. If producers consistently outpace consumers, the queue grows without bound until it hits memory limits, disk limits, or configured maximums. At that point the system must respond:
- Scale consumers — add more workers. The simplest response.
- Reject new messages — return an error to the producer when the queue is full. The producer can retry with exponential backoff. This is explicit backpressure.
- Drop messages — accept messages but discard the oldest when the queue is full. Acceptable for metrics and analytics, not for orders.
- Rate limit producers — throttle the producer's publish rate. See How Rate Limiting Works.
Implementations
RabbitMQ — full-featured message broker. Supports acknowledgments, dead letter exchanges, priority queues, routing, and multiple protocols (AMQP, MQTT, STOMP). Good for complex routing and when you need per-message control.
Amazon SQS — fully managed. Standard queues (best-effort ordering, at-least-once delivery) and FIFO queues (strict ordering, exactly-once processing). Dead letter queues built in. No infrastructure to manage.
Redis Lists — LPUSH to enqueue, BRPOP to dequeue with blocking. Simple and fast. No built-in acknowledgments or dead letter queues — you implement those yourself. Good for lightweight task queues.
Kafka Consumer Groups — Kafka is primarily a log, but consumer groups provide queue semantics. Messages within a partition go to one consumer in the group. Ordering is per-partition. Replay is possible. Higher operational complexity, but combines queue and pub/sub in one system.
When to Use a Message Queue
Good fits:
- Async task processing — image transcoding, PDF generation, email sending.
- Rate smoothing — absorb traffic spikes that would overwhelm downstream services.
- Resilience — if a downstream service is temporarily down, messages wait in the queue.
- Work distribution — scale by adding workers.
Poor fits:
- Real-time responses — if the caller needs the result immediately, a queue adds latency.
- Simple in-process workloads — don't add a message broker when a thread pool or channel will do.
- Tiny payloads at extreme frequency — the overhead of serialization, network, and acknowledgment may dominate.
Next Steps
- How Caching Works — another pattern for handling load by avoiding redundant work.
- How Rate Limiting Works — controlling the rate of incoming requests.
- How Event-Driven Architecture Works — the broader pattern that queues and pub/sub support.