
How Processes Work — Programs in Motion
A program is a file on disk — compiled instructions. A process is that program running: the instructions loaded into memory, a stack for function calls, a heap for dynamic data, open file descriptors, network sockets, environment variables, and a set of CPU registers. One program can have many processes (every tab in your browser is a separate process running the same program).
The kernel manages every process on the system. It decides which process runs on which CPU core, when to switch between them, and how to keep them isolated from each other.
What Makes Up a Process?
A process is more than just code. The kernel tracks:
| Component | What it holds |
|---|---|
| Memory | Text (code), data, heap, stack, memory-mapped files |
| PID | Process ID — a unique integer identifying the process |
| PPID | Parent process ID — who created this process |
| File descriptors | Open files, pipes, sockets (stdin=0, stdout=1, stderr=2) |
| CPU state | Program counter, registers, flags — saved/restored on context switch |
| Credentials | User ID (UID), group ID (GID) — determines permissions |
| Environment | Environment variables (PATH, HOME, etc.) |
| Signals | Pending and blocked signals |
| Exit status | Return code when the process terminates (0 = success) |
All of this is stored in a kernel data structure (the task_struct in Linux — over 600 fields). When you run ps or top, you're reading from these structures.
How Are Processes Created?
On Unix systems (Linux, macOS), new processes are created with two system calls: fork() and exec().
fork() creates an exact copy of the current process. The child gets a copy of the parent's memory, file descriptors, and state. The only difference: fork() returns 0 to the child and the child's PID to the parent.
exec() replaces the current process's program with a new one. The PID stays the same, but the code, stack, and heap are replaced with the new program.
Together:
pid = fork() // create child (copy of parent)
if pid == 0: // in the child
exec("./server") // replace child's program with server
else: // in the parent
wait(pid) // wait for child to finish
This is how your shell works. When you type ls, the shell forks a child, the child execs /bin/ls, and the shell waits for it to finish.
The fork-then-exec pattern seems wasteful — why copy all the parent's memory just to replace it? Modern kernels use copy-on-write (COW): the child shares the parent's physical memory pages. Only when either process writes to a page does the kernel copy it. If the child immediately calls exec(), almost no copying happens.
How Does the Kernel Schedule Processes?
A machine with 8 CPU cores can run 8 processes truly simultaneously. But a typical system has hundreds of processes. The kernel's scheduler decides which processes run, on which cores, and for how long.
The scheduler solves a multi-objective optimization problem:
- Fairness — every process should get CPU time proportional to its priority.
- Responsiveness — interactive processes (your terminal, your browser) should respond within milliseconds.
- Throughput — batch processes (compilation, data processing) should use CPU efficiently.
- Energy — idle cores should sleep to save power.
Linux uses the Completely Fair Scheduler (CFS): each process accumulates "virtual runtime" as it runs. The process with the least virtual runtime runs next. Higher-priority processes accumulate virtual runtime slower, so they get scheduled more often.
A typical time slice is 1-10 milliseconds. After each slice, the scheduler checks if another process should run. This switching — saving one process's CPU state and loading another's — is called a context switch.
What Is a Context Switch?
When the scheduler switches from Process A to Process B:
- Save A's state — CPU registers, program counter, stack pointer → stored in A's
task_struct. - Switch page tables — the MMU now translates addresses using B's virtual memory mappings.
- Restore B's state — load B's registers, program counter, stack pointer from B's
task_struct. - Resume B — the CPU continues executing B's code as if it was never interrupted.
A context switch costs 1-10 microseconds. That sounds fast, but at thousands of switches per second, it adds up. More importantly, switching invalidates CPU caches — the new process's data isn't in L1/L2/L3 cache, so the first memory accesses after a switch are slow (cache misses). This indirect cost is often larger than the switch itself.
How Do Processes Communicate?
Processes are isolated by default — one process can't read another's memory. But they need to communicate. Common mechanisms:
Pipes — a byte stream from one process to another. ls | grep "txt" connects ls's stdout to grep's stdin through a pipe. Unidirectional, in-memory, fast.
Signals — asynchronous notifications. SIGTERM asks a process to terminate gracefully. SIGKILL forces termination (can't be caught). SIGINT is what Ctrl+C sends. SIGCHLD tells a parent its child exited.
Shared memory — two processes map the same physical memory into their address spaces. The fastest IPC mechanism — no copying, just direct memory access. Requires synchronization (mutexes, semaphores) to avoid race conditions.
Sockets — network sockets work between processes on the same machine too (127.0.0.1 or Unix domain sockets). More overhead than shared memory but well-understood and language-agnostic.
Files — the simplest mechanism. One process writes, another reads. Works across reboots. Slow for high-frequency communication.
What Happens When a Process Exits?
When a process terminates (by returning from main, calling exit(), or receiving a fatal signal):
- Open file descriptors are closed — files, sockets, pipes.
- Memory is released — the kernel reclaims all pages.
- Children are reparented — orphaned children are adopted by PID 1 (init/systemd).
- Exit status is stored — the process becomes a zombie: it exists in the process table (so the parent can read the exit status with
wait()) but uses no resources. - Parent is notified — the kernel sends
SIGCHLDto the parent. - Parent calls
wait()— retrieves the exit status and removes the zombie.
If the parent never calls wait(), the zombie stays in the process table indefinitely. A zombie uses no memory or CPU — just a process table entry. But too many zombies can exhaust the PID space. This is why daemon processes (servers, background services) must handle SIGCHLD and reap their children.
What Is PID 1?
PID 1 is the first process the kernel starts. On modern Linux, it's systemd. On macOS, it's launchd. On containers, it's whatever the ENTRYPOINT specifies.
PID 1 has special responsibilities:
- Adopts orphans — when a process's parent exits, PID 1 becomes the new parent.
- Reaps zombies — PID 1 must call
wait()on adopted children, or zombies accumulate. - Receives unhandled signals differently —
SIGTERMandSIGINTdon't kill PID 1 unless it explicitly handles them. This is whyCtrl+Csometimes doesn't stop a Docker container — the process inside is PID 1 and ignores the signal.
Understanding PID 1 behavior matters for containers, where the distinction between a proper init system and a raw application as PID 1 affects signal handling, zombie reaping, and graceful shutdown.
Next Steps
Processes run in user space. The kernel is the layer beneath them that makes it all work:
- How the Kernel Works — the boundary between your code and the hardware.
- How Threads Work — lightweight execution within a process.
- How Memory Works — revisit memory with a deeper understanding of process isolation.
Prerequisites
References
Referenced by
- How Sorting Works — Ordering Data Efficiently
- How the Kernel Works — The Layer Between Your Code and the Hardware
- How Threads Work — Concurrency Within a Process
- Systems FAQ
- How Containers Work — Isolation Without Virtual Machines
- How Memory Works — Stack, Heap, and Virtual Memory
- What is Virtual Memory
- What is a File Descriptor