How Searching Works — Finding Data Fast

How Searching Works — Finding Data Fast

2026-03-22

Finding a specific item in a collection is the most common operation in computing. Every database query, every search engine lookup, every cache check is a search. The algorithm you choose — or the data structure you've prepared — determines whether it takes microseconds or minutes.

Linear Search — The Baseline

Check every element, one by one, until you find the target or reach the end.

for each element in array:
    if element == target:
        return found
return not found

O(n). Simple, works on any data (sorted or unsorted), requires no preparation. But at 1 billion items, you check up to 1 billion elements.

Linear search is the right choice when: the data is small (< 100 elements), the data is unsorted and you can't sort it, or you're searching once and building an index isn't worth it.

Binary Search — The Power of Sorted Data

If the data is sorted, you can do dramatically better. Binary search eliminates half the data on every step:

  1. Look at the middle element.
  2. If it's the target, done.
  3. If the target is smaller, search the left half.
  4. If the target is larger, search the right half.
  5. Repeat on the remaining half.
low = 0, high = n - 1
while low <= high:
    mid = (low + high) / 2
    if array[mid] == target: return mid
    if array[mid] < target: low = mid + 1
    else: high = mid - 1
return not found

O(log n). At 1 billion items: log₂(1,000,000,000) ≈ 30 comparisons. Not a billion — thirty.

ItemsLinear search (worst)Binary search (worst)
1001007
10,00010,00014
1,000,0001,000,00020
1,000,000,0001,000,000,00030

The requirement: the data must be sorted. This is why sorting (the previous lesson) is so fundamental — it enables binary search, which is the difference between O(n) and O(log n) for every subsequent lookup.

Binary Search Beyond Arrays

The binary search pattern — eliminate half the search space on each step — appears far beyond sorted arrays:

B-trees in databases — a B-tree index is a multi-way generalization of binary search. Each node contains sorted keys; you binary-search within the node, then follow the appropriate child pointer. This is how SELECT * FROM users WHERE id = 42 finds the row in 3-4 disk reads among billions of rows.

Git bisect — finding which commit introduced a bug. git bisect does binary search on the commit history: mark a good commit and a bad commit, test the middle, eliminate half. Finding a bug across 1,000 commits takes ~10 tests.

Debugging — "the bug is somewhere in this 500-line function." Comment out the bottom half. Does the bug persist? If yes, it's in the top half. Binary search on code.

System administration — "the server started failing sometime this week." Check Wednesday. Was it failing? Yes → check Monday-Wednesday. No → check Wednesday-Friday. Binary search on time.

The Two-Pointer Technique

When searching for a pair of elements that satisfy a condition (sum to a target, bracket a range), two pointers on a sorted array give O(n):

left = 0, right = n - 1
while left < right:
    sum = array[left] + array[right]
    if sum == target: return (left, right)
    if sum < target: left += 1
    else: right -= 1

This works because the array is sorted: moving left increases the sum, moving right decreases it. Without sorting, you'd need O(n²) nested loops.

Hash-Based Search — O(1)

If you need repeated exact-match searches, the fastest approach isn't searching at all — it's building a hash map for O(1) lookup:

seen = {}
for element in data:
    seen[element] = true
# Now any lookup is O(1)

The upfront cost is O(n) to build the map. But if you do many lookups, O(1) per lookup pays for itself immediately.

This is the insight behind every caching layer, every index, and every deduplication check: spend time organizing the data once so that every lookup is fast.

Choosing the Right Search

ApproachTimeRequiresBest for
Linear scanO(n)NothingSmall data, one-time search
Binary searchO(log n)Sorted dataRepeated searches on static data
Hash mapO(1)Hash table builtRepeated exact-match lookups
BSTO(log n)Balanced tree builtDynamic data + range queries
Inverted indexO(1) per termIndex builtText search

The pattern: if you search once, scan. If you search repeatedly, build a structure. The choice of structure depends on the query pattern — exact match (hash map), range (tree), text (inverted index), meaning (vector index).

Where Does Searching Appear?

Every if x in collection — Python's in operator is linear search on lists, hash lookup on sets and dicts. Knowing which structure you're searching determines performance.

Database WHERE clausesWHERE id = 5 uses a hash or B-tree index. WHERE age > 18 uses a B-tree for range search. WHERE name LIKE '%alice%' falls back to a scan (no index helps with infix matching — this is why full-text search exists).

Autocomplete — a trie or prefix tree gives O(k) lookup where k is the prefix length. Type "net" → find all entries starting with "net" without scanning all entries.

Next Steps