
How Searching Works — Finding Data Fast
Finding a specific item in a collection is the most common operation in computing. Every database query, every search engine lookup, every cache check is a search. The algorithm you choose — or the data structure you've prepared — determines whether it takes microseconds or minutes.
Linear Search — The Baseline
Check every element, one by one, until you find the target or reach the end.
for each element in array:
if element == target:
return found
return not found
O(n). Simple, works on any data (sorted or unsorted), requires no preparation. But at 1 billion items, you check up to 1 billion elements.
Linear search is the right choice when: the data is small (< 100 elements), the data is unsorted and you can't sort it, or you're searching once and building an index isn't worth it.
Binary Search — The Power of Sorted Data
If the data is sorted, you can do dramatically better. Binary search eliminates half the data on every step:
- Look at the middle element.
- If it's the target, done.
- If the target is smaller, search the left half.
- If the target is larger, search the right half.
- Repeat on the remaining half.
low = 0, high = n - 1
while low <= high:
mid = (low + high) / 2
if array[mid] == target: return mid
if array[mid] < target: low = mid + 1
else: high = mid - 1
return not found
O(log n). At 1 billion items: log₂(1,000,000,000) ≈ 30 comparisons. Not a billion — thirty.
| Items | Linear search (worst) | Binary search (worst) |
|---|---|---|
| 100 | 100 | 7 |
| 10,000 | 10,000 | 14 |
| 1,000,000 | 1,000,000 | 20 |
| 1,000,000,000 | 1,000,000,000 | 30 |
The requirement: the data must be sorted. This is why sorting (the previous lesson) is so fundamental — it enables binary search, which is the difference between O(n) and O(log n) for every subsequent lookup.
Binary Search Beyond Arrays
The binary search pattern — eliminate half the search space on each step — appears far beyond sorted arrays:
B-trees in databases — a B-tree index is a multi-way generalization of binary search. Each node contains sorted keys; you binary-search within the node, then follow the appropriate child pointer. This is how SELECT * FROM users WHERE id = 42 finds the row in 3-4 disk reads among billions of rows.
Git bisect — finding which commit introduced a bug. git bisect does binary search on the commit history: mark a good commit and a bad commit, test the middle, eliminate half. Finding a bug across 1,000 commits takes ~10 tests.
Debugging — "the bug is somewhere in this 500-line function." Comment out the bottom half. Does the bug persist? If yes, it's in the top half. Binary search on code.
System administration — "the server started failing sometime this week." Check Wednesday. Was it failing? Yes → check Monday-Wednesday. No → check Wednesday-Friday. Binary search on time.
The Two-Pointer Technique
When searching for a pair of elements that satisfy a condition (sum to a target, bracket a range), two pointers on a sorted array give O(n):
left = 0, right = n - 1
while left < right:
sum = array[left] + array[right]
if sum == target: return (left, right)
if sum < target: left += 1
else: right -= 1
This works because the array is sorted: moving left increases the sum, moving right decreases it. Without sorting, you'd need O(n²) nested loops.
Hash-Based Search — O(1)
If you need repeated exact-match searches, the fastest approach isn't searching at all — it's building a hash map for O(1) lookup:
seen = {}
for element in data:
seen[element] = true
# Now any lookup is O(1)
The upfront cost is O(n) to build the map. But if you do many lookups, O(1) per lookup pays for itself immediately.
This is the insight behind every caching layer, every index, and every deduplication check: spend time organizing the data once so that every lookup is fast.
Choosing the Right Search
| Approach | Time | Requires | Best for |
|---|---|---|---|
| Linear scan | O(n) | Nothing | Small data, one-time search |
| Binary search | O(log n) | Sorted data | Repeated searches on static data |
| Hash map | O(1) | Hash table built | Repeated exact-match lookups |
| BST | O(log n) | Balanced tree built | Dynamic data + range queries |
| Inverted index | O(1) per term | Index built | Text search |
The pattern: if you search once, scan. If you search repeatedly, build a structure. The choice of structure depends on the query pattern — exact match (hash map), range (tree), text (inverted index), meaning (vector index).
Where Does Searching Appear?
Every if x in collection — Python's in operator is linear search on lists, hash lookup on sets and dicts. Knowing which structure you're searching determines performance.
Database WHERE clauses — WHERE id = 5 uses a hash or B-tree index. WHERE age > 18 uses a B-tree for range search. WHERE name LIKE '%alice%' falls back to a scan (no index helps with infix matching — this is why full-text search exists).
Autocomplete — a trie or prefix tree gives O(k) lookup where k is the prefix length. Type "net" → find all entries starting with "net" without scanning all entries.
Next Steps
- How Recursion Works — binary search is naturally recursive; recursion is a fundamental algorithmic pattern.
- How Trees Work — BSTs implement binary search as a data structure.
- How Indexing Works — how search engines make text searchable.