How Container Images Work — Layers, OCI Spec, and Registries

2026-03-24

A container image is a packaged filesystem. It contains everything an application needs to run — the binary, its libraries, configuration files, and a minimal operating system layer. But an image is not a single file. It is a stack of layers, each one a diff on top of the previous, merged at runtime by a union filesystem.

This layered design is why images build fast (only changed layers are rebuilt), distribute fast (only missing layers are pulled), and use disk efficiently (shared layers are stored once).

What Is a Layer?

A layer is a tar archive containing filesystem changes: files added, modified, or deleted relative to the layer below. Every instruction in a Dockerfile that modifies the filesystem creates a new layer.

FROM ubuntu:22.04          # Layer 1: base OS (files from Ubuntu)
RUN apt-get update && \
    apt-get install -y nginx  # Layer 2: nginx binary + deps
COPY nginx.conf /etc/nginx/  # Layer 3: your config file
COPY index.html /var/www/     # Layer 4: your content

Layer 1 is the Ubuntu base image. Layer 2 adds nginx and its dependencies on top. Layer 3 adds your nginx configuration. Layer 4 adds your HTML content. Each layer is stored as a separate tar archive with a content-addressable hash (SHA256 digest).

overlayfs merges all layers into one view

/ unified root fs

read-only image layers

How Overlayfs Works

Overlayfs is the union filesystem used by Docker and most container runtimes. It merges multiple directory trees into a single unified view:

Lower directories — the read-only image layers, stacked in order.
Upper directory — the writable layer for the running container.
Merged directory — the unified view presented to the container as its root filesystem.

When the container reads a file, overlayfs searches from the upper layer down through the lower layers until the file is found. When the container writes a file, the write goes to the upper layer only. If the file exists in a lower layer, it is copied up to the upper layer before being modified — the lower layers are never changed.

Deleting a file creates a whiteout entry in the upper layer — a marker that hides the file in the lower layers without actually removing it. This is how layers stay immutable.

The OCI Image Specification

The OCI (Open Container Initiative) Image Specification defines the format for container images. It standardizes three things:

Image manifest — a JSON document listing the image's layers (as content-addressable digests) and a reference to the config. The manifest tells the runtime which layers to download and in what order.

Image config — a JSON document containing metadata: the entrypoint command, environment variables, exposed ports, working directory, and the diff IDs of each layer. This is what docker inspect returns.

Layer tarballs — gzip-compressed tar archives, each containing the filesystem diff for one layer. The digest (SHA256 hash) of each compressed tarball is its content address.

The content-addressable design means identical layers are stored and transferred only once. If ten images use ubuntu:22.04 as their base, only one copy of those layers exists on disk and in the registry.

Image Registries

A registry is an HTTP API that stores and serves container images. Docker Hub, GitHub Container Registry (ghcr.io), Amazon ECR, and Google Artifact Registry are registries.

Pulling an image:

Client requests the manifest for nginx:latest (resolves the tag to a digest).
Client reads the manifest to get the list of layer digests.
Client checks which layers are already cached locally.
Client downloads only the missing layers (in parallel).
Layers are unpacked and overlayfs is configured.

Pushing an image:

Client computes layer digests.
Client checks which layers the registry already has (by digest).
Client uploads only the new layers.
Client uploads the manifest.

This is why subsequent pulls are fast — most layers are already cached. And why push after changing one line of code uploads only the tiny layer that changed.

Tags vs Digests

A tag is a human-readable label: nginx:1.25, python:3.12-slim, myapp:latest. Tags are mutable — latest can point to different images over time. This is convenient but dangerous for reproducibility.

A digest is a content-addressable hash: nginx@sha256:abc123.... Digests are immutable — the same digest always refers to the same image. Use digests in production deployments for reproducible builds.

Multi-Stage Builds

A multi-stage Dockerfile uses multiple FROM instructions. Each stage starts a new image. Only the final stage becomes the output image. This separates build dependencies from runtime dependencies:

FROM rust:1.77 AS builder
COPY . /app
RUN cargo build --release

FROM debian:bookworm-slim
COPY --from=builder /app/target/release/myapp /usr/local/bin/
CMD ["myapp"]

The build stage contains the entire Rust toolchain (hundreds of MB). The final image contains only the compiled binary and a minimal Debian base. The build tools never appear in the runtime image.

Why Layer Order Matters

Each Dockerfile instruction that changes the filesystem creates a layer. Docker caches layers and reuses them if the instruction and all preceding layers are unchanged. When a layer changes, all subsequent layers are invalidated and rebuilt.

This means: put instructions that change rarely (installing OS packages) early in the Dockerfile, and instructions that change often (copying application code) late. Reversing this order means every code change invalidates the package installation layer, forcing a full reinstall.

# Good: dependencies change rarely, code changes often
COPY package.json /app/
RUN npm install
COPY . /app/

# Bad: every code change invalidates npm install
COPY . /app/
RUN npm install

Image Size and Efficiency

Each layer adds to the image size, even if a later layer deletes files created in an earlier layer — the deleted files still exist in the earlier layer's tarball. This is why you see patterns like RUN apt-get install && apt-get clean && rm -rf /var/lib/apt/lists/* — installing and cleaning up in the same layer prevents the package manager cache from being stored in a permanent layer.

Minimal base images (Alpine, distroless, scratch) reduce the base layer size. Alpine Linux is ~5 MB compared to Ubuntu's ~75 MB. Distroless images contain only the application and its runtime dependencies — no shell, no package manager, no unnecessary binaries.

Next Steps

How Container Networking Works — bridge networks, veth pairs, and port mapping.
How Namespaces Work — the mount namespace that makes overlayfs visible to the container.
How File Systems Work — the filesystem layer beneath overlayfs.

Prerequisites

How Cgroups Work

How Container Networking Works