How Memory Works — Stack, Heap, and Virtual Memory

How Memory Works — Stack, Heap, and Virtual Memory

2026-03-22

Every variable you declare, every object you create, every function you call — all of it lives in memory. Understanding how memory is organized explains why some allocations are fast and others are slow, why programs crash with "stack overflow" or "out of memory," and why every process appears to have the entire machine to itself.

What Does a Program's Memory Look Like?

When a process starts, the operating system gives it a memory layout divided into regions:

High addresses (0xFFFF...) Stack Local variables, function calls grows down (free space) grows up Heap Dynamic allocations (malloc, new, Box) BSS + Data Global/static variables Text (Code) Compiled instructions (read-only) Low addresses (0x0000...) ← kernel reserves the very bottom
  • Text — the compiled program instructions. Read-only. Shared between processes running the same binary.
  • Data / BSS — global and static variables. Data holds initialized values, BSS holds zero-initialized values.
  • Heap — dynamic memory. Grows upward. You allocate here with malloc (C), new (C++/Java), Box::new (Rust), or implicitly in garbage-collected languages.
  • Stack — function call frames. Grows downward. Every function call pushes a frame, every return pops one.

The heap and stack grow toward each other. In modern systems with virtual memory, running out of space between them is rare — but it's the origin of the layout.

How Does the Stack Work?

The stack is a LIFO (last-in, first-out) data structure managed automatically by the compiler. Every function call pushes a stack frame containing:

  • Return address — where to continue after this function returns
  • Local variables — the variables declared inside the function
  • Arguments — the values passed to the function
  • Saved registers — CPU register values that need to be restored after the call

When the function returns, its frame is popped. The memory is instantly available for the next call. There's no deallocation step — just moving a pointer.

This is why stack allocation is fast: allocating a local variable costs nothing beyond moving the stack pointer by a few bytes. No searching for free space, no metadata, no fragmentation.

But the stack has limits:

  • Fixed size — typically 1-8 MB per thread (configurable, but not growable at runtime). Exceeding it causes a stack overflow.
  • LIFO only — you can't free a stack-allocated variable out of order. Memory is released only when the function returns.
  • No sharing — once a function returns, its stack frame is gone. You can't return a pointer to a local variable (in languages that allow this, it's a bug — the pointer becomes dangling).

How Does the Heap Work?

The heap is for memory that outlives a single function call. You ask the allocator for N bytes, use them, and free them when done.

// C
char *buf = malloc(1024);  // allocate 1024 bytes
// ... use buf ...
free(buf);                  // release

// Rust
let buf = Box::new([0u8; 1024]);  // allocate on heap
// buf is freed automatically when it goes out of scope (Drop trait)

The allocator (typically jemalloc, mimalloc, or the system allocator) manages free space on the heap. It maintains data structures tracking which regions are allocated and which are free. When you call malloc(1024), the allocator finds a free region of at least 1024 bytes, marks it as used, and returns a pointer.

Heap allocation is slower than stack allocation because the allocator must:

  1. Search for a free region of sufficient size
  2. Update its bookkeeping structures
  3. Potentially request more memory from the OS (via mmap or sbrk)

Heap memory also introduces fragmentation — after many allocations and frees, the heap can become a patchwork of used and free regions. A request for 1 MB might fail even if there's 10 MB free, because no single contiguous region is large enough.

What Is Virtual Memory?

Every process sees its own private address space. Process A's address 0x7FFF1000 and Process B's address 0x7FFF1000 point to completely different physical memory. This isolation is made possible by virtual memory.

The CPU contains a Memory Management Unit (MMU) that translates virtual addresses to physical addresses on every memory access. The OS maintains a page table for each process — a mapping from virtual pages (typically 4 KB each) to physical frames.

Virtual memory provides three critical properties:

Isolation — one process cannot read or write another process's memory. The page tables simply don't contain mappings to other processes' physical frames. A wild pointer in Process A can crash Process A but can never corrupt Process B.

Overcommit — the OS can promise more memory than physically exists. If you allocate 8 GB on a machine with 4 GB of RAM, the OS says "sure" but doesn't actually assign physical frames until you write to each page. Pages that haven't been touched don't use physical memory.

Swapping — when physical memory is full, the OS can move inactive pages to disk (the swap file or swap partition). When the process accesses a swapped page, the MMU raises a page fault, and the OS loads the page back from disk. This is invisible to the process but dramatically slower — disk access is 100,000x slower than RAM access.

What Is a Page Fault?

When a process accesses a virtual address that isn't currently mapped to physical memory, the CPU raises a page fault — an exception that transfers control to the kernel.

Page faults are not always errors. There are three kinds:

Minor page fault — the page exists (it was allocated) but hasn't been mapped to a physical frame yet. The kernel allocates a frame, updates the page table, and resumes the process. Fast.

Major page fault — the page was swapped out to disk. The kernel reads it from disk into a physical frame, updates the page table, and resumes. Slow (milliseconds, not microseconds).

Invalid page fault — the process accessed memory it doesn't own. The kernel sends a signal (SIGSEGV on Unix, "access violation" on Windows). The process crashes. This is the "segmentation fault" that haunts C and C++ developers.

How Much Memory Does a Process Actually Use?

This is surprisingly hard to answer. There are several measures:

  • Virtual size (VSZ) — total virtual address space mapped. Often much larger than physical usage because of overcommit.
  • Resident Set Size (RSS) — physical memory currently in RAM. The most useful single number.
  • Shared memory — pages shared with other processes (shared libraries, memory-mapped files). Counted in each process's RSS but only uses physical memory once.
  • Proportional Set Size (PSS) — RSS with shared pages divided proportionally among sharing processes. The fairest measure.

A process with 2 GB VSZ might have 200 MB RSS. The difference is allocated-but-never-touched pages (demand paging) and memory-mapped files that haven't been read.

Why Does This Matter?

Understanding memory explains behaviors you encounter constantly:

Stack overflow — recursive function with no base case, or allocating a large array on the stack. The fix is to allocate on the heap (use Vec instead of [u8; 10_000_000] in Rust) or increase the stack size.

Memory leaks — allocating heap memory and never freeing it. In garbage-collected languages, this means holding references you don't need. In manual-memory languages, it means forgetting to call free. In Rust, the ownership system makes leaks rare (but still possible with Rc cycles or Box::leak).

Out of memory (OOM) — the OS kills your process because physical memory + swap are exhausted. The OOM killer picks the process using the most memory. Understanding RSS vs VSZ helps you predict which process gets killed.

Performance cliffs — a program that fits in RAM runs fast. The moment it exceeds physical memory and starts swapping, performance drops by orders of magnitude. There's no gradual degradation — it's a cliff.

Next Steps

Memory gives processes their data. The next question is: who creates and manages those processes?