How Indexes Work — Making Queries Fast

2026-03-23

Without an index, finding a row means scanning every page in the table — O(n). With an index, the same lookup is O(log n). On a table with 10 million rows, that's the difference between reading 10 million rows and reading 24.

An index is a separate data structure — usually a B-tree — that maps column values to row locations. It's like the index at the back of a book: look up the term, get the page number, go directly to the page.

How Does a B-Tree Index Work?

A B-tree index stores keys (column values) in sorted order across a tree of pages. Each leaf page contains keys and pointers to the actual rows in the table.

To look up WHERE email = '[email protected]':

Start at the root page. Binary search for the range containing "alice@...".
Follow the child pointer to the next level. Binary search again.
At the leaf page, find the key and follow the pointer to the heap row.

Total: 3-4 page reads for any table size. Each level of the tree narrows the search by the branching factor (typically 100-500 keys per page).

B-tree indexes support:

Exact match: WHERE id = 42
Range queries: WHERE age BETWEEN 18 AND 30
Prefix queries: WHERE name LIKE 'Ali%' (but NOT LIKE '%ice')
Sorting: ORDER BY created_at — the index is already sorted
Min/max: SELECT MIN(price) — first or last key in the index

What Is a Hash Index?

A hash index maps keys to row locations using a hash function. O(1) lookup — faster than B-tree for exact matches.

But hash indexes can't do range queries, sorting, or prefix matching. WHERE id = 42 works. WHERE id > 42 requires a full scan. This limits their use.

PostgreSQL supports hash indexes. MySQL's MEMORY engine uses them. Most databases default to B-tree because it handles all query patterns.

What Is a Composite Index?

A composite index covers multiple columns. CREATE INDEX ON orders(user_id, created_at) creates a B-tree sorted first by user_id, then by created_at within each user.

This index accelerates:

WHERE user_id = 5 — uses the first column
WHERE user_id = 5 AND created_at > '2026-01-01' — uses both columns
WHERE user_id = 5 ORDER BY created_at — index is already sorted

But NOT:

WHERE created_at > '2026-01-01' — the first column isn't constrained, so the index can't narrow the search. This is the leftmost prefix rule: a composite index can only be used if the query constrains the columns from left to right.

Column order in composite indexes matters enormously. Put the most selective (most filtering) column first, or the column you most frequently query on.

What Is a Covering Index?

Normally, an index lookup returns a row pointer, and the database must fetch the actual row from the heap — an additional disk read. A covering index includes all the columns the query needs, so the row fetch is skipped.

CREATE INDEX ON orders(user_id, status, total);

-- This query is "covered" — all needed columns are in the index
SELECT status, total FROM orders WHERE user_id = 5;

The database reads only the index, never touching the table. This can be 2-10x faster for queries that hit many rows.

PostgreSQL calls this an "index-only scan." The INCLUDE clause lets you add columns to the index without affecting sort order: CREATE INDEX ON orders(user_id) INCLUDE (status, total).

When Should You Add an Index?

Add an index when:

A query scans too many rows (check EXPLAIN output)
A WHERE clause filters on a column frequently
A JOIN condition matches on a column
An ORDER BY clause sorts on a column without an index (forces a sort operation)

Don't add an index when:

The table is small (< 1,000 rows) — a full scan is fast enough
The column has low cardinality (e.g., a boolean) — the index doesn't narrow the search much
Write performance matters more than read — each index slows down INSERT, UPDATE, and DELETE because the index must be maintained

Every index costs: disk space (the index structure), write overhead (maintaining the tree on every modification), and memory (index pages in the buffer pool compete with data pages).

How Do You Know If an Index Is Being Used?

EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) shows the query plan:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

-- Without index:
-- Seq Scan on users  (cost=0.00..1234.00 rows=1 width=64)
--   Filter: (email = '[email protected]')
--   Rows Removed by Filter: 99999
--   Execution Time: 45.123 ms

-- With index:
-- Index Scan using users_email_idx on users  (cost=0.29..8.31 rows=1 width=64)
--   Index Cond: (email = '[email protected]')
--   Execution Time: 0.045 ms

Seq Scan = sequential scan = reading every page. O(n). Index Scan = using the index. O(log n).

The difference: 45ms vs 0.045ms — 1,000x. This is the entire point of indexes.

What Is an Index Scan vs Index-Only Scan vs Bitmap Scan?

Scan type	How it works	When used
Seq Scan	Read every page in the table	No useful index, or query reads most rows
Index Scan	Traverse index, fetch each row from heap	Few rows match, index is selective
Index-Only Scan	Read only the index, skip heap	All needed columns are in the index
Bitmap Index Scan	Build a bitmap of matching pages, then fetch	Moderate number of rows (too many for index scan, too few for seq scan)

The query planner chooses automatically based on statistics about the data (row count, value distribution, correlation).

Indexes and Search Engines

Database indexes and search engine indexes solve the same problem: avoid scanning everything. A database B-tree index maps column values to rows. A search engine inverted index maps terms to documents. A vector index (HNSW) maps vectors to nearest neighbors.

The principle is identical: build a structure once, look up many times. The tradeoff is always: slower writes (maintaining the structure) for faster reads (avoiding scans).

Next Steps

How Transactions Work — ACID guarantees and how databases maintain correctness.
How Trees Work — the B-tree data structure behind most indexes.
How Complexity Works — O(log n) vs O(n) and why it matters.

Prerequisites

How Transactions Work