Containers FAQ

Common questions about containers, namespaces, cgroups, images, networking, and container runtimes. Each answer is short. Links go to the full explanation.

What is the difference between a container and a VM?

A container is an isolated Linux process that shares the host kernel. It uses namespaces for isolation and cgroups for resource limits. A virtual machine runs its own guest kernel on emulated hardware via a hypervisor.

Containers start in milliseconds with kilobytes of overhead. VMs take seconds to minutes and consume hundreds of MB for the guest OS. Hundreds of containers run on a single host; dozens of VMs fit. The tradeoff: VMs provide stronger isolation (separate kernel, hypervisor boundary). Containers share the kernel's attack surface — a kernel vulnerability affects all containers.

See How Containers Work for the full architectural comparison.

How does Docker use Linux namespaces?

Docker creates seven Linux namespaces for each container: PID (own process tree), network (own IP and routing table), mount (own filesystem), user (UID mapping), UTS (own hostname), IPC (own shared memory), and cgroup (own cgroup view).

The container runtime calls clone() with namespace flags (CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | ...) to create a process in fresh namespaces. The process sees only its isolated resources. The host kernel manages everything underneath — namespaces are a view restriction, not a separate system.

See How Namespaces Work for each namespace type in detail.

What happens when a container runs out of memory?

When a container exceeds its cgroup memory limit (memory.max), the kernel's OOM killer terminates a process in the cgroup. In Docker, this kills the container (exit code 137). In Kubernetes, the pod enters OOMKilled status and is restarted according to its restart policy.

The soft limit (memory.high) triggers aggressive memory reclamation before the hard limit — the kernel swaps and evicts page cache, which slows the container but does not kill it. Monitor memory.current and memory.events to detect pressure before hitting the limit.

See How Cgroups Work for the full memory controller behavior.

Why are container images built in layers?

Layers make builds and distribution efficient through three mechanisms:

  • Build caching — when you change one line of code, only the layer containing that change is rebuilt. Preceding layers are reused from cache.
  • Transfer efficiency — when pulling an image, only layers not already cached locally are downloaded. Shared base layers transfer once.
  • Disk deduplication — multiple images sharing the same base layers store those layers only once on disk.

Layers are immutable and content-addressed by SHA256 digest. At runtime, overlayfs merges all layers into a single filesystem view with a writable layer on top.

See How Container Images Work for layer caching strategies and optimization.

What is the OCI specification?

The OCI (Open Container Initiative) defines three industry standards:

  • Image Specification — defines the image format: a manifest (JSON listing layers and config), a config (JSON with entrypoint, env, metadata), and layer tarballs (compressed filesystem diffs).
  • Runtime Specification — defines how a container runtime creates a container from an OCI bundle (rootfs + config.json). runc is the reference implementation.
  • Distribution Specification — defines the HTTP API for pushing and pulling images to registries.

OCI ensures interoperability — images built with Docker work with Podman, containerd, and Kubernetes without modification.

See How Container Images Work for the OCI image format in practice.

How does container networking work?

Docker creates a Linux bridge (docker0) and connects each container via a veth pair — a virtual cable with one end in the container's network namespace (appearing as eth0) and the other end on the bridge.

Containers get private IPs (172.17.0.x). Container-to-container traffic flows through the bridge. Port mapping uses iptables DNAT rules to forward host ports to container ports. Outbound traffic is source-NATed through the host IP via MASQUERADE rules. User-defined bridge networks add DNS-based service discovery — containers resolve each other by name.

See How Container Networking Works for the full network architecture.

What is the difference between ENTRYPOINT and CMD?

ENTRYPOINT sets the executable that always runs. CMD provides default arguments that are overridden when you pass arguments to docker run. Together they form the full command: ENTRYPOINT + CMD.

ENTRYPOINT ["nginx"]
CMD ["-g", "daemon off;"]
# docker run myimage → nginx -g "daemon off;"
# docker run myimage -c /custom.conf → nginx -c /custom.conf

Use exec form (JSON array) for ENTRYPOINT so the process runs directly as PID 1 and receives signals correctly. Shell form wraps the command in /bin/sh -c, which can prevent proper SIGTERM handling.

See How Container Images Work for Dockerfile best practices.

Can you run containers without Docker?

Yes. Docker is a convenience layer, not a requirement. Several alternatives exist:

  • Podman — Docker-compatible CLI, daemonless, supports rootless containers.
  • containerd + nerdctl — the runtime Kubernetes uses, with a Docker-compatible CLI.
  • CRI-O — a Kubernetes-specific container runtime.
  • runc — the low-level OCI runtime. You can create a container directly from an OCI bundle.
  • LXC/LXD — system containers (closer to lightweight VMs).

You can even build a container manually using unshare (namespaces), cgroup filesystem writes (resource limits), and pivot_root (filesystem isolation). Every "container tool" is a wrapper around these kernel primitives.

See How Containers Work for the underlying kernel features.

How do volumes persist data across container restarts?

A volume is a directory on the host filesystem mounted into the container, bypassing overlayfs. Data written to a volume goes directly to the host disk and persists when the container stops, restarts, or is removed.

Docker manages volumes at /var/lib/docker/volumes/. Bind mounts map a specific host path instead. Both bypass the union filesystem and have native I/O performance.

The container's writable layer (the overlayfs upper directory) is deleted when the container is removed. Anything not in a volume is lost. This is by design — containers are ephemeral, volumes are persistent.

See How Containers Work for the container filesystem architecture.

What is a container runtime?

A container runtime creates and manages containers. The stack has two levels:

  • Low-level runtime (runc, crun, youki) — creates namespaces, configures cgroups, mounts the root filesystem, and executes the container process. Implements the OCI Runtime Specification.
  • High-level runtime (containerd, CRI-O) — manages the full lifecycle: pulling images, unpacking layers, creating OCI bundles, calling the low-level runtime, and managing running state.

Docker is a CLI and daemon that calls containerd, which calls runc. Kubernetes uses containerd or CRI-O directly via the Container Runtime Interface (CRI). The runtime is swappable — changing from runc to gVisor changes the isolation model without changing images or orchestration.

See How Containers Work for the full runtime stack and startup sequence.