How Partitioning Works — Splitting Data Across Nodes

How Partitioning Works — Splitting Data Across Nodes

2026-03-23

A single database machine has limits: disk capacity, memory, and CPU. When your data or traffic exceeds what one machine can handle, you need to split it across multiple machines. This is partitioning (also called sharding).

Each partition holds a subset of the data. Queries for a specific record go to the partition that holds it. The system scales horizontally — add more machines, hold more data, handle more queries.

How Is Data Partitioned?

Two main strategies:

Hash Partitioning

Hash the partition key (e.g., user_id), then assign to a partition based on the hash value:

partition = hash(user_id) % num_partitions

Advantages: distributes data evenly (assuming a good hash function). Hot keys are spread across partitions. Simple to implement.

Disadvantages: range queries are impossible. WHERE user_id BETWEEN 100 AND 200 must query all partitions because hash values aren't ordered. Adjacent keys land on different partitions.

Used by: DynamoDB, Cassandra (with vnodes), Redis Cluster.

Range Partitioning

Assign contiguous ranges of the partition key to each partition:

Partition 1: user_id 1–1,000,000
Partition 2: user_id 1,000,001–2,000,000
Partition 3: user_id 2,000,001–3,000,000

Advantages: range queries work efficiently. WHERE user_id BETWEEN 100 AND 200 goes to a single partition. Adjacent data is co-located.

Disadvantages: uneven distribution. If user IDs are assigned sequentially, all new writes go to the last partition (hot partition). Some ranges may have much more data than others.

Used by: HBase, Bigtable, CockroachDB (with automatic splitting).

What Is Consistent Hashing?

Standard hash partitioning (hash(key) % N) breaks when you add or remove nodes — almost every key maps to a different partition, requiring massive data movement.

Consistent hashing minimizes this disruption. Nodes and keys are both mapped onto a ring (using hash values from 0 to 2^32). Each key is assigned to the next node clockwise on the ring.

When a node is added, only the keys between the new node and its predecessor need to move — roughly 1/N of all keys. When a node is removed, its keys move to the next node clockwise. The rest of the data stays put.

Virtual nodes improve balance: each physical node is assigned multiple positions on the ring. This prevents uneven distribution caused by nodes hashing to adjacent positions.

Used by: DynamoDB, Cassandra, Memcached, CDN routing.

How Do You Query Partitioned Data?

Single-partition queries — the application knows which partition holds the data (from the partition key). Route the query directly. Fast.

Scatter-gather queries — the query doesn't include the partition key, so it must go to all partitions. Each partition returns its results, and the coordinator merges them. Slow — latency is the slowest partition.

Cross-partition joins — joining data across partitions requires network round trips. Much slower than local joins. This is why partition key choice is critical — co-locate data that's queried together.

How Do You Choose a Partition Key?

The partition key determines data distribution and query efficiency. The right key:

  • Distributes evenly — no single partition holds a disproportionate amount of data or traffic.
  • Co-locates related data — data queried together should be on the same partition.
  • Supports common query patterns — if most queries filter by user_id, partition by user_id.

Good keys: user_id (for user-centric applications), tenant_id (for multi-tenant SaaS), region (for geographic data).

Bad keys: timestamp (all recent writes go to one partition), boolean fields (only 2 partitions), auto-increment ID with range partitioning (sequential writes hit one partition).

Compound keys: Cassandra supports (user_id, created_at) where user_id determines the partition and created_at sorts within the partition. This co-locates a user's data and supports range queries on time within a user.

What Happens When Partitions Become Unbalanced?

Hotspots — one partition receives disproportionate traffic. Causes: a celebrity user generates millions of writes, a popular product gets all the reads, sequential IDs create a write hotspot.

Solutions:

  • Salting — append a random suffix to the key to spread hot keys across partitions. user_123_0, user_123_1, user_123_2 go to different partitions. Reads must query all suffixed keys and merge.
  • Splitting — some systems (CockroachDB, HBase) automatically split a hot partition into two when it exceeds a size or traffic threshold.

Rebalancing — when you add or remove nodes, data must be redistributed. With consistent hashing, only 1/N of data moves. Without it, most data must move (reshuffling).

Rebalancing should be gradual — moving too much data at once overwhelms the network and degrades performance. Most systems throttle rebalancing to limit its impact.

Partitioning and Replication Combined

In practice, partitioning and replication are used together. Each partition is replicated across multiple nodes for fault tolerance.

A system with 3 partitions and 3 replicas has 9 copies of the data across 9 nodes (or fewer, if nodes hold multiple partition replicas). Each partition has a leader and followers, managed by consensus or leader election.

This is how Cassandra, CockroachDB, DynamoDB, and Kafka operate: partitioned for scale, replicated for durability and availability.

Next Steps