Software Architecture FAQ

Common questions about microservices, event-driven architecture, pub/sub, message queues, caching, load balancers, and CQRS. Each answer is short. Links go to the full explanation.

When should I use microservices vs a monolith?

Start with a monolith. It is simpler to develop, deploy, and debug. Split when you have concrete reasons:

  • Multiple teams stepping on each other in the same codebase.
  • Different deploy cadences — one team ships daily, another weekly.
  • Independent scaling — one component needs 20 instances, another needs one.
  • Failure isolation — a bug in reporting should not crash checkout.

Microservices solve organizational scaling problems but create technical ones — distributed transactions, network latency, and operational complexity. Most teams that start with microservices on day one end up fighting the architecture instead of building the product.

See How Microservices Work for the full tradeoff analysis.

What is the difference between pub/sub and a message queue?

Pub/sub — one message, many consumers. Every subscriber gets every message. Use it for event notifications, broadcasting, and fan-out (inventory, email, and analytics all react to the same OrderPlaced event).

Message queue — one message, one consumer. Messages are distributed among competing workers. Use it for task distribution and load balancing (10 workers processing video transcoding jobs from a shared queue).

Many systems combine both. AWS SNS (pub/sub) fans out to multiple SQS queues (point-to-point), and each queue has competing consumers.

See How Pub/Sub Works and How Message Queues Work for the full comparison.

How does cache invalidation work?

Four strategies:

  • TTL — entries expire after a fixed time. Simple, predictable, stale for up to TTL seconds.
  • Event-based — when data changes, an event triggers cache deletion. Most accurate, requires event infrastructure.
  • Manual purge — explicitly clear the cache (during deployments, bug fixes).
  • Versioned keysuser:42:v3. Increment the version on changes. Old entries age out.

Most production systems use TTL as a safety net combined with event-based invalidation for fast updates. The hardest bugs come from caches that are never invalidated — the data looks correct until someone notices a stale value that has been wrong for hours.

See How Caching Works for all cache patterns and invalidation strategies.

What is event sourcing?

Event sourcing stores every change as an immutable event instead of overwriting the current state. An order is not a row — it is a sequence of events: OrderCreated, PaymentReceived, OrderShipped. The current state is derived by replaying events.

Benefits: complete audit trail, temporal queries ("what was the state at 3 PM?"), ability to rebuild read models by replaying history. Often paired with CQRS — events are the write side, projections are the read side.

Tradeoffs: harder to query than a relational database, event schemas must evolve carefully, replaying millions of events is slow without snapshots.

See How CQRS Works for how event sourcing and CQRS work together.

What is a circuit breaker in software?

A circuit breaker prevents cascading failures when a dependency is down. Three states:

  1. Closed — requests flow normally. Failure rate is monitored.
  2. Open — too many failures. Requests fail immediately (fast fail). No calls to the broken dependency.
  3. Half-open — after a timeout, test requests are allowed. Success closes the breaker. Failure reopens it.

Without a circuit breaker, a slow dependency causes threads to pile up waiting for timeouts, consuming resources and eventually bringing down the calling service. With a circuit breaker, the failure is contained — the calling service returns a degraded response or error immediately.

See How Microservices Work for resilience patterns in microservice architectures.

How does a load balancer decide where to send traffic?

Common algorithms:

  • Round-robin — sequential rotation through servers. Default. Fair when servers are equal.
  • Least connections — server with fewest active connections. Adapts when request costs vary.
  • Weighted — servers with higher capacity get more traffic.
  • Consistent hashing — same key always maps to the same server. Good for cache locality.
  • Random (power of two choices) — pick two servers at random, choose the less loaded one. Near-optimal with minimal overhead.

Round-robin is the starting point. Switch to least connections when request processing times vary significantly. Use consistent hashing when cache locality matters.

See How Load Balancing Works for Layer 4 vs Layer 7, health checks, and session affinity.

What is CQRS and when should I use it?

CQRS separates the read model from the write model. Commands go to a normalized write database. Queries go to a denormalized read database (or multiple — Elasticsearch for search, Redis for lookups, ClickHouse for analytics).

Use CQRS when:

  • Read-to-write ratio is 100:1 or higher.
  • Reads and writes need different scaling (add read replicas without touching the write path).
  • Multiple query patterns need different data shapes.
  • You need an audit trail (event sourcing).

Do not use CQRS for simple CRUD applications where one database with proper indexes serves both reads and writes efficiently. The overhead of maintaining two models, an event pipeline, and eventual consistency is not justified.

See How CQRS Works for the full architecture.

What is a saga pattern?

A saga is a sequence of local transactions across multiple services, each with a compensating action that undoes it if a later step fails. It replaces distributed ACID transactions in microservice architectures.

Example: Create order -> Charge payment -> Reserve inventory. If inventory reservation fails, the saga runs compensating actions in reverse: refund payment, cancel order.

Two coordination patterns: choreography (services react to events autonomously) and orchestration (a central coordinator manages the flow). Orchestration is easier to understand and debug. Choreography is more decoupled.

See How Distributed Transactions Work for the full distributed transaction landscape.

What is the difference between event-driven and request-driven architecture?

Request-driven — Service A calls Service B and waits for a response. A knows about B. They are coupled in time (A waits), availability (B must be up), and knowledge (A imports B's API).

Event-driven — Service A emits an event and moves on. Service B processes the event asynchronously. A does not know B exists. They are decoupled in all three dimensions.

Most production systems use both. Synchronous calls (REST, gRPC) for operations that need an immediate response. Asynchronous events for operations that can tolerate delay — notifications, analytics, indexing, cross-service data propagation.

See How Event-Driven Architecture Works for the full pattern.

What is a service mesh?

A service mesh is infrastructure for managing service-to-service communication. Sidecar proxies (typically Envoy) are deployed alongside each service, intercepting all traffic. They handle:

  • Mutual TLS — encrypted, authenticated communication between services.
  • Retries and timeouts — automatic retry with backoff.
  • Circuit breaking — stop calling failing dependencies.
  • Observability — metrics, traces, and logs for every request.
  • Traffic management — canary deployments, A/B testing, rate limiting.

A control plane (Istio, Linkerd) configures all the proxies. The application code is unchanged.

The tradeoff: added latency (two proxy hops per call), resource overhead (a sidecar per pod), and operational complexity. Service meshes are most valuable in large deployments (50+ services) with strict security and observability requirements.

How do microservices communicate?

Two patterns:

SynchronousREST or gRPC. One service calls another and waits for the response. Simple, easy to reason about. Creates runtime dependencies — if the called service is down, the caller fails.

Asynchronous — events and message queues. One service publishes an event or enqueues a message. Other services process it later. Decoupled in time and availability. Introduces eventual consistency.

gRPC is the standard for synchronous service-to-service calls (efficient binary protocol, strong typing, streaming). Kafka is the standard for asynchronous event streaming. REST is common for external APIs and simpler internal calls.

See How Microservices Work for communication patterns and tradeoffs.