
How Indexes Work — Making Queries Fast
Without an index, finding a row means scanning every page in the table — O(n). With an index, the same lookup is O(log n). On a table with 10 million rows, that's the difference between reading 10 million rows and reading 24.
An index is a separate data structure — usually a B-tree — that maps column values to row locations. It's like the index at the back of a book: look up the term, get the page number, go directly to the page.
How Does a B-Tree Index Work?
A B-tree index stores keys (column values) in sorted order across a tree of pages. Each leaf page contains keys and pointers to the actual rows in the table.
To look up WHERE email = '[email protected]':
- Start at the root page. Binary search for the range containing "alice@...".
- Follow the child pointer to the next level. Binary search again.
- At the leaf page, find the key and follow the pointer to the heap row.
Total: 3-4 page reads for any table size. Each level of the tree narrows the search by the branching factor (typically 100-500 keys per page).
B-tree indexes support:
- Exact match:
WHERE id = 42 - Range queries:
WHERE age BETWEEN 18 AND 30 - Prefix queries:
WHERE name LIKE 'Ali%'(but NOTLIKE '%ice') - Sorting:
ORDER BY created_at— the index is already sorted - Min/max:
SELECT MIN(price)— first or last key in the index
What Is a Hash Index?
A hash index maps keys to row locations using a hash function. O(1) lookup — faster than B-tree for exact matches.
But hash indexes can't do range queries, sorting, or prefix matching. WHERE id = 42 works. WHERE id > 42 requires a full scan. This limits their use.
PostgreSQL supports hash indexes. MySQL's MEMORY engine uses them. Most databases default to B-tree because it handles all query patterns.
What Is a Composite Index?
A composite index covers multiple columns. CREATE INDEX ON orders(user_id, created_at) creates a B-tree sorted first by user_id, then by created_at within each user.
This index accelerates:
WHERE user_id = 5— uses the first columnWHERE user_id = 5 AND created_at > '2026-01-01'— uses both columnsWHERE user_id = 5 ORDER BY created_at— index is already sorted
But NOT:
WHERE created_at > '2026-01-01'— the first column isn't constrained, so the index can't narrow the search. This is the leftmost prefix rule: a composite index can only be used if the query constrains the columns from left to right.
Column order in composite indexes matters enormously. Put the most selective (most filtering) column first, or the column you most frequently query on.
What Is a Covering Index?
Normally, an index lookup returns a row pointer, and the database must fetch the actual row from the heap — an additional disk read. A covering index includes all the columns the query needs, so the row fetch is skipped.
CREATE INDEX ON orders(user_id, status, total);
-- This query is "covered" — all needed columns are in the index
SELECT status, total FROM orders WHERE user_id = 5;
The database reads only the index, never touching the table. This can be 2-10x faster for queries that hit many rows.
PostgreSQL calls this an "index-only scan." The INCLUDE clause lets you add columns to the index without affecting sort order: CREATE INDEX ON orders(user_id) INCLUDE (status, total).
When Should You Add an Index?
Add an index when:
- A query scans too many rows (check
EXPLAINoutput) - A
WHEREclause filters on a column frequently - A
JOINcondition matches on a column - An
ORDER BYclause sorts on a column without an index (forces a sort operation)
Don't add an index when:
- The table is small (< 1,000 rows) — a full scan is fast enough
- The column has low cardinality (e.g., a boolean) — the index doesn't narrow the search much
- Write performance matters more than read — each index slows down
INSERT,UPDATE, andDELETEbecause the index must be maintained
Every index costs: disk space (the index structure), write overhead (maintaining the tree on every modification), and memory (index pages in the buffer pool compete with data pages).
How Do You Know If an Index Is Being Used?
EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) shows the query plan:
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
-- Without index:
-- Seq Scan on users (cost=0.00..1234.00 rows=1 width=64)
-- Filter: (email = '[email protected]')
-- Rows Removed by Filter: 99999
-- Execution Time: 45.123 ms
-- With index:
-- Index Scan using users_email_idx on users (cost=0.29..8.31 rows=1 width=64)
-- Index Cond: (email = '[email protected]')
-- Execution Time: 0.045 ms
Seq Scan = sequential scan = reading every page. O(n). Index Scan = using the index. O(log n).
The difference: 45ms vs 0.045ms — 1,000x. This is the entire point of indexes.
What Is an Index Scan vs Index-Only Scan vs Bitmap Scan?
| Scan type | How it works | When used |
|---|---|---|
| Seq Scan | Read every page in the table | No useful index, or query reads most rows |
| Index Scan | Traverse index, fetch each row from heap | Few rows match, index is selective |
| Index-Only Scan | Read only the index, skip heap | All needed columns are in the index |
| Bitmap Index Scan | Build a bitmap of matching pages, then fetch | Moderate number of rows (too many for index scan, too few for seq scan) |
The query planner chooses automatically based on statistics about the data (row count, value distribution, correlation).
Indexes and Search Engines
Database indexes and search engine indexes solve the same problem: avoid scanning everything. A database B-tree index maps column values to rows. A search engine inverted index maps terms to documents. A vector index (HNSW) maps vectors to nearest neighbors.
The principle is identical: build a structure once, look up many times. The tradeoff is always: slower writes (maintaining the structure) for faster reads (avoiding scans).
Next Steps
- How Transactions Work — ACID guarantees and how databases maintain correctness.
- How Trees Work — the B-tree data structure behind most indexes.
- How Complexity Works — O(log n) vs O(n) and why it matters.
Prerequisites
Referenced by
- How Partitioning Works — Splitting Data Across Nodes
- Databases FAQ
- How Transactions Work — ACID and Database Correctness
- How Query Execution Works — From SQL to Results
- What is a Primary Key
- What is an Index
- How Storage Engines Work — Organizing Data on Disk
- How Searching Works — Finding Data Fast
- B-Tree vs LSM Tree — When to Use Which
- Reading EXPLAIN ANALYZE — A Practical Guide to Query Plans