What is a Saga
A saga is a pattern for managing distributed transactions across multiple services. Instead of one big atomic transaction spanning multiple databases (which requires coordination protocols like two-phase commit), a saga breaks the work into a sequence of local transactions — each in its own service's database. If any step fails, previously completed steps are reversed by compensating actions.
How it works
Consider an e-commerce order that involves three services: Payment, Inventory, and Shipping. A saga executes:
- Payment — charge the customer's card.
- Inventory — reserve the items.
- Shipping — schedule delivery.
If Shipping fails (item cannot be delivered to that address), the saga runs compensating actions in reverse: release the inventory reservation, then refund the payment. Each compensating action is a new local transaction that undoes the effect of the original step.
There are two coordination strategies:
Choreography. Each service publishes an event when its step completes. The next service listens for that event and runs its step. On failure, it publishes a failure event that triggers compensating actions upstream. No central coordinator. Simple for small flows but hard to trace and debug as the number of services grows.
Orchestration. A central saga orchestrator tells each service what to do and when. The orchestrator maintains the saga's state machine and decides whether to proceed forward or trigger compensation. Easier to understand and monitor, but the orchestrator is a single point of coordination.
Sagas provide eventual consistency, not atomicity. Between steps, the system is in an intermediate state. The Payment service has charged the card but Inventory hasn't reserved the items yet. Designing compensating actions that correctly undo partial work is the hardest part.
Why it matters
Sagas are the standard pattern for distributed transactions in microservices architectures where a single ACID transaction spanning multiple databases is not possible. Understanding sagas — and their limitations around intermediate states and compensation — is essential for building correct distributed workflows.
See How Distributed Transactions Work for the full comparison of sagas, two-phase commit, and outbox patterns.