What is Partitioning

Partitioning (also called sharding) splits data across multiple nodes so that no single node stores the entire dataset. Each node holds a partition — a subset of the data. This is how databases scale beyond the capacity of a single machine. Replication copies the same data to multiple nodes for fault tolerance. Partitioning distributes different data to different nodes for capacity.

How it works

There are two main partitioning strategies:

Hash partitioning. Apply a hash function to the partition key (e.g., user_id) and assign the result to a node. This distributes data evenly — no node gets disproportionately more data than others. The downside: range queries (WHERE created_at > '2025-01-01') cannot be routed to a single partition because adjacent keys hash to different nodes. Cassandra, DynamoDB, and CockroachDB use hash partitioning.

Range partitioning. Divide the key space into contiguous ranges and assign each range to a node. Node 1 handles keys A-F, node 2 handles G-M, and so on. Range queries on the partition key are efficient — they hit one partition. The downside: hot spots. If most writes go to keys in one range, that node becomes a bottleneck. HBase and Spanner use range partitioning.

Both strategies face the rebalancing problem. When you add or remove nodes, data must move. Consistent hashing minimizes data movement by only reassigning keys near the changed node. Without it, adding a node reshuffles most of the data.

Cross-partition queries are expensive. A query that spans all partitions (e.g., a JOIN across two differently-partitioned tables) must contact every node, gather results, and merge them. This is called a scatter-gather query. Effective partitioning puts related data on the same partition to avoid this.

Why it matters

Partitioning is how databases handle datasets that exceed a single machine's storage or throughput. Understanding your partition key choice — and the trade-offs between hash and range strategies — determines whether your system scales smoothly or hits hot spots under load.

See How Partitioning Works for the full walkthrough of partitioning strategies, rebalancing, and secondary indexes.

This concept appears in

How Partitioning Works — Splitting Data Across Nodes

What is Partitioning

How it works

Why it matters

Related

This concept appears in

Referenced by