How Threads Work — Concurrency Within a Process

2026-03-22

A process is a running program with its own memory space. A thread is a separate flow of execution within that process. Multiple threads share the same memory — the same heap, the same global variables, the same file descriptors — but each has its own stack and CPU registers.

Threads exist because many programs need to do multiple things at once. A web server handles thousands of connections. A video editor renders frames while the UI stays responsive. A build system compiles files in parallel. Processes can do this too, but creating a process is expensive (copying page tables, allocating kernel structures). Creating a thread is cheap (just a new stack and register set). And threads share memory directly, while processes need explicit IPC.

How Do Threads Differ from Processes?

	Process	Thread
Memory	Own address space	Shared with other threads
Creation cost	Expensive (fork, page tables)	Cheap (new stack only)
Communication	IPC (pipes, sockets, shared memory)	Direct memory access
Isolation	Full — one can't crash another	None — one can corrupt another's data
Context switch	Expensive (TLB flush)	Cheaper (same address space)

The shared memory is both the advantage and the danger. No copying, no serialization, no IPC overhead. But no isolation either — if thread A writes garbage to a shared variable, thread B reads garbage.

How Does the Kernel See Threads?

On Linux, threads and processes are both represented as task_struct — the kernel doesn't fundamentally distinguish them. A thread is a process that shares its memory space with other processes. The clone() syscall creates both: with CLONE_VM (share memory), you get a thread. Without it, you get a process.

This means the scheduler treats threads and processes equally. Each thread gets its own time slices, its own priority, and can run on any CPU core. Two threads of the same process can run truly simultaneously on different cores.

What Is Concurrency vs Parallelism?

Concurrency — multiple tasks make progress over the same period. They might interleave on a single core (taking turns), or they might run simultaneously on multiple cores. Concurrency is about structure.

Parallelism — multiple tasks execute at the exact same instant on different cores. Parallelism is about execution.

A single-core machine can have concurrency (the scheduler switches between threads) but not parallelism. A multi-core machine can have both.

The distinction matters because concurrent code must handle interleaving correctly even when it's not parallel. A race condition that only appears under parallelism is still a bug — it just manifests less often on a single core.

What Is a Race Condition?

A race condition occurs when two threads access shared data and at least one of them writes, without synchronization. The result depends on the order of execution, which is non-deterministic.

// Thread A              // Thread B
read counter  (= 5)     read counter  (= 5)
add 1         (= 6)     add 1         (= 6)
write counter (= 6)     write counter (= 6)
// Expected: 7. Got: 6.

Both threads read 5, both compute 6, both write 6. One increment is lost. This is the classic lost update problem. It happens because read-modify-write is not atomic — the scheduler can switch threads between the read and the write.

Race conditions are among the hardest bugs to find because they depend on timing. They may not reproduce on a developer's machine (single core, different scheduler behavior) but appear under load in production.

How Do You Prevent Race Conditions?

Mutex (mutual exclusion) — a lock that only one thread can hold at a time. Before accessing shared data, acquire the mutex. After, release it. Any other thread trying to acquire the mutex waits until it's released.

mutex.lock()
counter += 1   // only one thread can execute this at a time
mutex.unlock()

Atomic operations — the CPU provides instructions that read-modify-write in a single, indivisible step. atomic_add(&counter, 1) cannot be interrupted. Faster than a mutex for simple operations, but limited to what the hardware supports (integers, pointers).

Read-write locks — allow multiple readers simultaneously but exclusive write access. Good for data that's read often and written rarely.

Channels — instead of sharing memory, send messages between threads. Go's goroutines use channels. Rust's std::sync::mpsc provides channels. The data moves from one thread to another — no shared state, no race conditions.

What Is a Deadlock?

A deadlock occurs when two or more threads wait for each other to release resources, and none can proceed:

// Thread A              // Thread B
lock(mutex_1)            lock(mutex_2)
lock(mutex_2)  ← waits  lock(mutex_1)  ← waits
// Both wait forever.

Thread A holds mutex_1 and waits for mutex_2. Thread B holds mutex_2 and waits for mutex_1. Neither can release what the other needs.

Deadlocks require four conditions simultaneously:

Mutual exclusion — resources can't be shared.
Hold and wait — a thread holds resources while waiting for more.
No preemption — resources can't be taken away.
Circular wait — threads form a cycle of dependencies.

Breaking any one condition prevents deadlocks. The most common strategy is lock ordering — always acquire locks in the same order. If every thread locks mutex_1 before mutex_2, circular wait can't happen.

What Are Green Threads and Async?

Kernel threads are expensive: each needs a stack (typically 2-8 MB), and context switching requires a kernel transition. A server handling 100,000 concurrent connections can't create 100,000 kernel threads — the memory alone would be 200 GB.

Green threads (Go's goroutines, Erlang's processes) — the language runtime manages thousands of lightweight threads that are multiplexed onto a smaller number of kernel threads. A goroutine starts with a 4 KB stack that grows as needed. Go's runtime scheduler distributes goroutines across kernel threads.

Async/await (Rust's tokio, JavaScript's event loop, Python's asyncio) — instead of blocking a thread while waiting for I/O, the task yields control. The runtime runs other tasks on the same thread during the wait. No thread-per-connection, no stack-per-connection.

Both approaches solve the same problem: high concurrency without high memory overhead. The tradeoff is complexity — green threads and async runtimes add their own scheduling overhead and debugging challenges.

Model	Stack per task	Scheduling	Use case
Kernel threads	2-8 MB	OS scheduler	CPU-bound work
Green threads	4 KB (growable)	Runtime scheduler	I/O-heavy (Go servers)
Async tasks	None (state machine)	Runtime event loop	I/O-heavy (Rust, JS)

Why Does This Matter?

Every non-trivial program uses threads or async. Understanding the model explains:

Why servers have thread pools — creating a thread per request is expensive. A fixed pool of threads handles requests from a queue. The pool size is tuned to the number of CPU cores and the I/O-to-compute ratio.

Why Rust has the borrow checker — Rust prevents race conditions at compile time. If two threads could access the same data, the compiler requires either immutable access (multiple readers) or exclusive mutable access (one writer). Data races are impossible in safe Rust.

Why shared mutable state is "the root of all evil" — every concurrency bug (races, deadlocks, data corruption) comes from multiple threads mutating shared data. The less shared mutable state, the fewer bugs. Channels, immutable data, and thread-local storage all reduce sharing.

Next Steps

How File Systems Work — how the kernel organizes data on disk.
How Memory Works — revisit memory with thread stacks in mind.
How Processes Work — the isolation boundary that threads don't have.

Prerequisites

How File Systems Work

References

Operating Systems: Three Easy Pieces