What is an LSM Tree
A Log-Structured Merge tree (LSM tree) is a data structure designed for workloads where writes vastly outnumber reads. Instead of updating data in place on disk like a B-tree, an LSM tree absorbs writes in memory and periodically flushes them to disk as immutable sorted files. RocksDB, LevelDB, Cassandra, and ScyllaDB all use LSM trees as their storage engine.
How it works
Writes go to an in-memory sorted structure called the memtable (typically a red-black tree or skip list). Every write is also appended to a WAL for durability. When the memtable reaches a size threshold, it is frozen and flushed to disk as an immutable SSTable (Sorted String Table) -- a file of key-value pairs sorted by key.
Over time, SSTables accumulate. Reading a key may require checking the memtable and then multiple SSTables from newest to oldest. To prevent reads from degrading, the storage engine runs compaction -- a background process that merges multiple SSTables into fewer, larger ones, discarding deleted keys and outdated versions along the way.
There are two main compaction strategies. Size-tiered compaction merges similarly-sized SSTables together, which is simple and write-friendly but can temporarily double disk usage during compaction. Leveled compaction organizes SSTables into levels of increasing size, keeping each level sorted and non-overlapping, which provides more predictable read performance at the cost of higher write amplification.
Bloom filters are commonly placed in front of SSTables to avoid unnecessary disk reads. A Bloom filter can quickly confirm that a key is definitely not in a given SSTable, skipping it entirely.
Why it matters
LSM trees trade read performance for dramatically higher write throughput. Any workload that is write-heavy -- time series data, event logs, message queues -- benefits from this trade-off. Understanding when to choose an LSM-based engine over a B-tree engine is one of the most consequential decisions in database architecture.
See How Storage Engines Work for the full comparison of LSM trees and B-trees.