How Containers Work — Process Isolation, Not Virtual Machines

2026-03-24

A container is a process running on the host operating system with restricted visibility and limited resources. It is not a virtual machine. There is no hypervisor, no guest kernel, no hardware emulation. The container shares the host kernel — it just cannot see or touch most of the host's resources.

When you run docker run nginx, Docker does not boot a machine. It creates a regular Linux process, wraps it in a set of namespaces (so it sees only its own filesystem, PIDs, and network), assigns it to a cgroup (so it cannot consume unlimited CPU or memory), and mounts a union filesystem (so it gets a layered, copy-on-write root filesystem). The result looks like an isolated machine but is actually a constrained process.

The Three Pillars

Containers rest on three kernel features. Each solves a different problem:

Namespaces control what a process can see. A container in its own PID namespace sees itself as PID 1 — it has no idea other processes exist on the host. A container in its own network namespace has its own IP address, routing table, and firewall rules. A container in its own mount namespace has its own root filesystem. Linux provides seven namespace types, and a container typically uses all of them.

Cgroups (control groups) control what a process can use. A cgroup limits CPU time, memory, disk I/O, and the number of processes a container can create. When a container hits its memory limit, the kernel's OOM killer terminates it — the same mechanism that kills any process that exhausts its memory allocation.

Union filesystems control what a process sees on disk. A union filesystem (typically overlayfs) stacks read-only image layers with a writable layer on top. Reads fall through the layers until the file is found. Writes go to the top layer. This is how multiple containers can share the same base image without duplicating gigabytes of data.

How a Container Starts

When a container runtime like runc launches a container, the sequence is:

Parse the OCI bundle — read the configuration (rootfs path, environment variables, capabilities, namespace settings).
Create namespaces — call clone() with flags for each namespace type (CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWNS, etc.).
Set up the cgroup — create a cgroup directory, write resource limits, add the container's PID.
Mount the root filesystem — set up the overlayfs mount with image layers as lower directories and a writable upper directory.
Pivot root — switch the container's root filesystem from the host's / to the container's overlayfs mount.
Drop capabilities — remove Linux capabilities the container does not need (no raw sockets, no kernel module loading, no clock changes).
Execute the entrypoint — exec() the container's command (e.g., nginx -g 'daemon off;').

After step 7, the container is a regular process. The kernel enforces all restrictions through existing mechanisms — no special "container mode" exists.

Containers vs Virtual Machines

The fundamental difference: a VM runs its own kernel. A container shares the host kernel.

Virtual Machine

Hardware Hypervisor (VMM) App A Bins/Libs Guest Kernel Guest OS App B Bins/Libs Guest Kernel Guest OS

Container

Hardware Host Kernel Container Runtime (runc) App A Bins/Libs App B Bins/Libs namespaces namespaces

A VM boots a full guest operating system with its own kernel. This provides strong isolation — the guest kernel handles syscalls independently — but costs hundreds of megabytes of RAM and seconds to start. Each VM duplicates kernel code, device drivers, and system services.

A container shares the host kernel. It calls the same syscalls as every other process on the host. Isolation comes from namespaces and cgroups, not from hardware separation. This means containers start in milliseconds, use megabytes of overhead instead of gigabytes, and can run hundreds per host instead of dozens.

The tradeoff: containers provide weaker isolation than VMs. A kernel vulnerability affects all containers on the host. A VM with its own kernel is unaffected by host kernel bugs (though hypervisor vulnerabilities are also possible).

	Virtual Machine	Container
Isolation	Hardware-level (hypervisor)	Kernel-level (namespaces + cgroups)
Startup time	Seconds to minutes	Milliseconds
Memory overhead	Hundreds of MB (guest OS)	Kilobytes (process metadata)
Kernel	Own guest kernel	Shared host kernel
Density	Tens per host	Hundreds per host
Security boundary	Strong (hypervisor)	Moderate (kernel features)
Examples	QEMU/KVM, VMware, Hyper-V	Docker, Podman, containerd

Why Containers Are Not Secure by Default

Sharing the host kernel means sharing the kernel's attack surface. A container process makes syscalls to the same kernel as every other process. If a syscall has a vulnerability, a container can exploit it to escape its namespace isolation.

Default container configurations run as root inside the container, have access to most syscalls, and share the host's kernel. Hardening requires: running as a non-root user, dropping Linux capabilities, restricting syscalls with seccomp profiles, using read-only root filesystems, and limiting network access.

Rootless containers (supported by Podman and recent Docker) run the entire container runtime as an unprivileged user, adding a layer of defense using user namespaces.

The Runtime Stack

The modern container stack has three layers:

High-level runtime (containerd, CRI-O) — manages container lifecycle, image pulling, storage, and networking. Speaks the Kubernetes CRI protocol.
Low-level runtime (runc, crun, youki) — creates namespaces, sets up cgroups, pivots root, executes the process. Speaks the OCI Runtime Specification.
CLI/daemon (Docker, Podman, nerdctl) — user-facing interface that calls the high-level runtime.

When you run docker run nginx, Docker tells containerd, which calls runc, which creates the namespaces, cgroups, and overlayfs mount, then exec's the nginx process.

How Containers Work — Process Isolation, Not Virtual Machines

The Three Pillars

How a Container Starts

Containers vs Virtual Machines

Why Containers Are Not Secure by Default

The Runtime Stack

Next Steps

Prerequisites

Next

Referenced by

How Containers Work — Process Isolation, Not Virtual Machines

The Three Pillars

How a Container Starts

Containers vs Virtual Machines

Why Containers Are Not Secure by Default

The Runtime Stack

Next Steps

Prerequisites

Next

Related

Referenced by