Databases FAQ
Common questions about storage engines, indexes, transactions, ACID, query plans, and write-ahead logging. Each answer is short. Links go to the full explanation.
What is the difference between a B-tree and an LSM tree?
A B-tree stores data in sorted pages and updates them in place. Reads are fast — traverse 3-4 levels to find any row. Writes are slower because updating a page requires reading it, modifying it, and writing it back (random I/O). PostgreSQL and MySQL (InnoDB) use B-trees.
An LSM tree (Log-Structured Merge tree) writes data sequentially to sorted, immutable files in memory (memtable), then flushes them to disk as sorted runs. Background compaction merges runs to keep reads efficient. Writes are fast (sequential only). Reads are slower because they may check multiple files. RocksDB, LevelDB, and Cassandra use LSM trees.
See How Storage Engines Work for the full architectural comparison.
When should I add a database index?
Add an index when EXPLAIN ANALYZE shows a sequential scan on a large table for a query you run frequently. Good candidates: columns in WHERE clauses, JOIN conditions, and ORDER BY clauses.
Do not index everything. Each index slows down INSERT, UPDATE, and DELETE because the database must maintain the index alongside the table. Indexes on low-cardinality columns (boolean, status with 3 values) rarely help — the planner often prefers a full scan. On write-heavy tables with rare reads, indexes cost more than they save.
See How Indexes Work for composite indexes, covering indexes, and partial indexes.
What is ACID?
ACID is four properties that guarantee reliable transactions:
- Atomicity — all operations in a transaction succeed or none do. Enforced by the WAL.
- Consistency — constraints (foreign keys, unique, check) are enforced at commit.
- Isolation — concurrent transactions don't see each other's uncommitted changes. Enforced by MVCC.
- Durability — committed data survives crashes. Enforced by fsyncing the WAL to disk.
See How Transactions Work for isolation levels and how each property is implemented.
What is the difference between row stores and column stores?
A row store keeps all columns of a row together on the same page. Reading a full row is one I/O. This is ideal for OLTP — point lookups, single-row updates, transactional workloads. PostgreSQL, MySQL, and SQLite are row stores.
A column store keeps all values of a single column together. Aggregating one column across millions of rows reads only that column's data, not the entire row. Compression is better because adjacent values in a column are often similar. This is ideal for analytics — SUM, AVG, COUNT over large datasets. ClickHouse, DuckDB, and Apache Parquet are columnar.
See How Storage Engines Work for how the storage layout affects I/O patterns.
Why is VACUUM important in PostgreSQL?
PostgreSQL implements MVCC by storing old row versions in the table itself. When you UPDATE a row, the old version remains until VACUUM removes it. Without VACUUM:
- Table bloat — tables grow with dead rows, wasting disk space and slowing sequential scans.
- Index bloat — indexes point to dead rows, making lookups less efficient.
- Transaction ID wraparound — PostgreSQL uses 32-bit transaction IDs. After about 2 billion transactions without VACUUM, the database must halt to prevent data corruption.
Autovacuum runs in the background but may lag behind on write-heavy tables. Monitor pg_stat_user_tables.n_dead_tup and tune autovacuum thresholds if dead rows accumulate.
See How Transactions Work for the full MVCC lifecycle in PostgreSQL.
What does EXPLAIN ANALYZE show?
EXPLAIN shows the query plan the database chose — the tree of operations (scan, join, sort, filter), estimated row counts, and estimated costs. EXPLAIN ANALYZE actually executes the query and adds real numbers: actual time per operation, actual row counts, and loop iterations.
Look for: sequential scans on large tables (missing index), large gaps between estimated and actual row counts (stale statistics — run ANALYZE), and nested loop joins on large tables (might need a hash join).
See How Query Execution Works for reading and interpreting query plans.
What is a deadlock in databases?
A deadlock occurs when two transactions each hold a lock the other needs. Transaction A locks row 1 and requests row 2. Transaction B locks row 2 and requests row 1. Neither can proceed — they wait forever.
The database detects the cycle (using a wait-for graph or timeout) and aborts one transaction — the deadlock victim. The victim's transaction is rolled back and can be retried.
Prevention: always acquire locks in the same order. If every transaction locks rows by primary key ascending, circular waits cannot form. Keep transactions short to minimize the window for conflicts.
See How Transactions Work for lock types, deadlock detection, and prevention strategies.
What is write-ahead logging?
Write-ahead logging (WAL) is the durability mechanism in virtually every serious database. The rule: write the log record to disk before the data change is considered committed.
On every write, the database appends a log entry describing the change. On commit, the log is fsynced. The actual data pages are updated in memory and flushed to disk lazily in the background. On crash, the database replays the WAL from the last checkpoint — redo committed changes, undo uncommitted ones.
WAL also converts random writes (scattered page updates) into sequential writes (append to log), which is significantly faster on all storage hardware.
See How WAL Works for checkpoints, WAL segmentation, and configuration tuning.