
How Trees Work — Hierarchies, Search Trees, and B-Trees
A tree is a hierarchical data structure: a root node connected to child nodes, each of which may have its own children. No cycles, no node with two parents. This structure appears everywhere — file systems, ASTs, the DOM, database indexes, DNS hierarchy, decision trees.
The key insight: trees give you O(log n) operations. Not as fast as a hash map's O(1), but trees maintain order — they support range queries, sorted iteration, and finding minimums and maximums. Hash maps can't do any of that.
What Is a Binary Search Tree?
A binary search tree (BST) is a tree where each node has at most two children, and values are ordered: everything in the left subtree is smaller, everything in the right subtree is larger.
To search for 25: start at root (20), go right (25 > 20), arrive at 30, go left (25 < 30), found 25. Three comparisons for 7 nodes. In general: O(log n) for a balanced tree.
Search: O(log n) — each comparison eliminates half the tree. Insert: O(log n) — search for the position, add the node. Delete: O(log n) — find the node, replace with successor/predecessor. Range query: O(log n + k) — find the start, then walk in order. This is what hash maps can't do.
The Balance Problem
A BST's performance depends on balance. If you insert sorted data (1, 2, 3, 4, 5), the tree degenerates into a linked list — every node has only a right child. O(log n) becomes O(n).
Self-balancing trees fix this:
- AVL trees — maintain a strict height balance. After each insert/delete, rotate nodes to rebalance. Height never exceeds 1.44 × log₂(n).
- Red-black trees — relax the balance constraint slightly. Fewer rotations than AVL, still O(log n). Used by Java's TreeMap, C++'s std::map, and Linux's scheduler.
What Is a B-Tree?
B-trees are the tree structure that powers databases. Unlike binary trees (2 children per node), B-tree nodes hold many keys and have many children — often 100-1000 per node.
Why? Because B-trees are designed for disk, not memory. A disk read takes ~10 milliseconds — 100,000x slower than a memory access. But disk reads a full block (4 KB) at once. A B-tree node fills an entire disk block with keys, so one disk read gives you hundreds of keys to binary-search through.
A B-tree with branching factor 100 and 100 million keys has height ≈ 4. Four disk reads to find any key in 100 million. That's why database index lookups are fast.
| Property | Binary Search Tree | B-Tree |
|---|---|---|
| Keys per node | 1 | 100-1000 |
| Children per node | 2 | 101-1001 |
| Height (1M keys) | ~20 | ~3 |
| Optimized for | Memory | Disk |
| Used in | In-memory collections | Databases, filesystems |
PostgreSQL, MySQL, SQLite, and MongoDB all use B-tree variants (B+ trees, specifically) for their default indexes. File systems like ext4 and Btrfs use B-trees for directory and extent lookups.
B+ Trees — The Database Variant
Most databases use B+ trees, a variant where:
- Internal nodes only store keys and child pointers — no data.
- Leaf nodes store both keys and data (or pointers to data).
- Leaf nodes are linked — a linked list connecting all leaves allows efficient sequential scan.
The leaf linking is critical for range queries. To find all rows where age BETWEEN 18 AND 30, the database finds 18 in the tree (O(log n)), then walks the leaf linked list to 30 — no more tree traversals.
Where Do Trees Appear?
Database indexes — B+ trees store sorted keys with pointers to table rows. CREATE INDEX ON users(email) builds a B+ tree.
File systems — ext4 uses extents organized in a tree. Btrfs and ZFS are entirely tree-structured (copy-on-write trees). Inodes are found through directory trees.
ASTs — Abstract Syntax Trees represent code structure. Every compiler and code search tool parses code into trees.
DOM — HTML is a tree. <html> is the root, <body> and <head> are children, and so on. CSS selectors and JavaScript's querySelector traverse this tree.
DNS — the domain name hierarchy is a tree. . → .io → 8vast.io → docs.8vast.io.
Decision trees — machine learning models that split data by features. Random forests are collections of decision trees.
Tries — trees where each edge is a character. Used for autocomplete, IP routing tables, and spell checking. Looking up a word of length k is O(k), independent of how many words are stored.
Trees vs Hash Maps vs Arrays
| Operation | Sorted Array | Hash Map | Balanced BST | B-Tree |
|---|---|---|---|---|
| Lookup | O(log n) | O(1) | O(log n) | O(log n) |
| Insert | O(n) | O(1) | O(log n) | O(log n) |
| Range query | O(log n + k) | O(n) | O(log n + k) | O(log n + k) |
| Min/max | O(1) | O(n) | O(log n) | O(log n) |
| Sorted iteration | O(n) | O(n log n) | O(n) | O(n) |
| Optimized for | Read-heavy, small | Exact match | General purpose | Disk-based |
The choice depends on the access pattern. If you only do exact lookups, use a hash map. If you need order, use a tree. If the data is on disk, use a B-tree.
Next Steps
- How Graphs Work — when data isn't hierarchical but has arbitrary connections.
- How Indexing Works — how inverted indexes and vector indexes build on these structures.
- How File Systems Work — B-trees in ext4, Btrfs, and ZFS.
Prerequisites
Next
Referenced by
- How Arrays and Linked Lists Work — Contiguous vs Connected
- How Graphs Work — Networks, Dependencies, and Connections
- How Hash Maps Work — O(1) Lookup From First Principles
- How Sorting Works — Ordering Data Efficiently
- How Recursion Works — Solving Problems by Breaking Them Down
- How Searching Works — Finding Data Fast
- How Graph Algorithms Work — BFS, DFS, Shortest Path, and Topological Sort