How Cgroups Work — Resource Limits and Accounting

How Cgroups Work — Resource Limits and Accounting

2026-03-24

A cgroup (control group) is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a group of processes. If namespaces control what a process can see, cgroups control what it can consume.

When you run docker run --memory=512m --cpus=2, Docker creates a cgroup with those limits. The container's processes are placed in that cgroup. The kernel enforces the limits transparently — no cooperation from the containerized application is required.

The Cgroup Hierarchy

Cgroups are organized as a tree. Each node in the tree is a cgroup that can have resource limits and contain processes. Child cgroups inherit their parent's constraints and can have additional limits applied.

/sys/fs/cgroup root cgroup docker/ memory.max = 4G cpu.max = 400000 100000 system.slice/ system services nginx memory.max = 512M cpu.max = 100000 PIDs: 42, 43, 44 postgres memory.max = 2G cpu.max = 200000 PIDs: 50, 51..58 Controllers: cpu memory io pids Each cgroup dir has control files for limits + stats

Child cgroups cannot exceed parent limits

The hierarchy is exposed as a filesystem, typically mounted at /sys/fs/cgroup. Creating a directory creates a cgroup. Writing a PID to cgroup.procs adds a process. Writing a value to memory.max sets a memory limit. Everything is done through file I/O — no special syscalls needed.

Resource Controllers

Each resource type has a controller that enforces limits and tracks usage.

Memory Controller

The memory controller limits how much memory a cgroup can use. Key control files:

  • memory.max — hard limit. When the cgroup's memory usage hits this limit, the kernel's OOM killer terminates a process in the cgroup.
  • memory.high — soft limit. When exceeded, the kernel aggressively reclaims memory from the cgroup (swapping, page cache eviction) but does not kill processes. Applications slow down but survive.
  • memory.current — current memory usage (read-only).
  • memory.swap.max — maximum swap usage.

When a cgroup exceeds memory.max, the OOM killer selects a process to terminate based on the oom_score_adj value. In a container context, this means the container is killed when it exceeds its memory limit — the same behavior you see when a Kubernetes pod is "OOMKilled."

CPU Controller

The CPU controller limits CPU time. Two mechanisms:

CPU shares (cpu.weight in v2, cpu.shares in v1) set relative priority. A cgroup with weight 200 gets twice the CPU time of a cgroup with weight 100 — but only when the CPU is contended. If no other cgroup needs the CPU, any cgroup can use 100%.

CPU quota (cpu.max in v2) sets a hard limit. The format is quota period — for example, 200000 100000 means "200ms of CPU time per 100ms period," which is 2 CPU cores. A container with --cpus=1.5 gets cpu.max = 150000 100000.

The difference matters: shares are relative and allow bursting (your container can use idle CPU). Quotas are absolute limits (your container is throttled even if CPUs are idle).

I/O Controller

The I/O controller (io.max, io.weight) throttles disk read/write bandwidth and IOPS per device. You can limit a container to, say, 50 MB/s of disk writes, preventing a noisy neighbor from saturating the disk for every other container.

PID Controller

The PID controller (pids.max) limits the number of processes a cgroup can create. This is the fork bomb defense — without it, a container could call fork() in an infinite loop and exhaust the host's PID space, denying service to every other container and the host itself.

Cgroup v1 vs v2

Cgroup v1 was the original implementation. Each controller had its own independent hierarchy. A process could be in one cgroup for memory and a different cgroup for CPU. This made configuration complex and sometimes inconsistent — setting a memory limit on one hierarchy had no relationship to the CPU limit on another.

Cgroup v2 (unified hierarchy) puts all controllers on a single tree. A process belongs to exactly one cgroup, and all controllers are managed together. This is simpler, more predictable, and the only version actively developed.

Key v2 improvements:

  • Single hierarchy — one cgroup tree, all controllers attached to it.
  • Pressure Stall Information (PSI) — real-time metrics showing how much time processes in a cgroup spend waiting for CPU, memory, or I/O. Used by Kubernetes for resource decisions.
  • Better delegation — unprivileged users can manage sub-cgroups (enabling rootless containers).
  • Threaded mode — threads within a process can be in different sub-cgroups for CPU scheduling.

Most modern distributions (Ubuntu 22.04+, Fedora 31+, Debian 11+) default to cgroup v2. Docker and Kubernetes fully support v2.

How Container Runtimes Use Cgroups

When you run docker run --memory=512m --cpus=1.5 --pids-limit=100 nginx, the runtime:

  1. Creates a cgroup directory: /sys/fs/cgroup/docker/<container-id>/
  2. Writes 512000000 to memory.max
  3. Writes 150000 100000 to cpu.max
  4. Writes 100 to pids.max
  5. Writes the container's PID to cgroup.procs

The container process and all its children are now constrained. The kernel enforces limits on every memory allocation, CPU scheduling decision, and fork() call.

Kubernetes adds another layer: the kubelet creates cgroups for each pod (with the pod's resource requests and limits) and nested cgroups for each container within the pod. The pod-level cgroup ensures the total resource usage of all containers in the pod stays within bounds.

Monitoring Cgroup Usage

Every cgroup exposes usage statistics through files:

$ cat /sys/fs/cgroup/docker/<id>/memory.current
234881024

$ cat /sys/fs/cgroup/docker/<id>/cpu.stat
usage_usec 8420316
user_usec 6320000
system_usec 2100316
nr_periods 1542
nr_throttled 12
throttled_usec 48000

docker stats, Prometheus cAdvisor, and Kubernetes metrics all read from these files. The nr_throttled and throttled_usec values in cpu.stat tell you whether your CPU limit is too tight — if a container is frequently throttled, it needs more CPU or the limit should be raised.

Next Steps