Releases

Container Escape Vulnerabilities: The CVEs That Shaped Docker and Kubernetes Security

Why Container Escapes Matter Containers are not virtual machines. A virtual machine runs its own kernel on emulated hardware, creating a strong isolation boundary. A container shares the host kernel with every other container on the system — isolation comes from Linux kernel features (namespaces, cgroups, capabilities, seccomp filters), not from a hardware-enforced boundary. When […]

February 16, 2026 6 min read

Why Container Escapes Matter

Containers are not virtual machines. A virtual machine runs its own kernel on emulated hardware, creating a strong isolation boundary. A container shares the host kernel with every other container on the system — isolation comes from Linux kernel features (namespaces, cgroups, capabilities, seccomp filters), not from a hardware-enforced boundary.

When an attacker escapes a container, they break through those kernel-level abstractions and gain access to the host. From there, they can reach every other container on that node, access mounted secrets and credentials, and pivot deeper into the cluster. In a Kubernetes or Docker production environment, a single container escape can compromise an entire node and, in the worst case, the entire cluster.

This article covers the most significant container escape CVEs from 2017 through 2024: how each exploit worked, what made it possible, and how the ecosystem responded. The same classes of bugs keep resurfacing, and the defensive patterns developed in response form the foundation of modern container security.

CVE-2017-5123: The waitid Kernel Exploit

What Happened

In October 2017, a vulnerability was discovered in the Linux kernel’s waitid() system call. During a refactor of the waitid code in kernel version 4.13, a critical check was accidentally removed: the access_ok() call that validates whether a user-supplied pointer actually points to user-space memory. Without this check, an unprivileged process could pass a pointer to kernel memory, and the kernel would happily write data to that location.

How the Exploit Worked

The bug allowed an attacker to write partially controlled data to an arbitrary kernel memory address. While the attacker could not fully control the content being written — the kernel wrote a siginfo_t structure with fields determined by process state — careful manipulation of which process was being waited on gave enough control to be dangerous.

The container escape leveraged this kernel write primitive to modify the calling process’s capability structure in kernel memory. Docker containers run with a restricted set of Linux capabilities, which is one of the primary mechanisms preventing containerized processes from performing privileged operations on the host. By overwriting the capability bitmask, the attacker could grant themselves CAP_SYS_ADMIN and CAP_NET_ADMIN — effectively breaking out of the container’s capability restrictions and gaining host-level privileges.

Impact and Fix

This vulnerability affected Linux kernel 4.13 through 4.14.0-rc4. The fix was straightforward: re-adding the access_ok() check to validate that the user-provided pointer targets user-space memory. The bug was introduced on May 21, 2017 and patched on October 9, 2017.

CVE-2017-5123 demonstrated something fundamental: containers share the host kernel, and a kernel vulnerability is a container escape vulnerability. No amount of namespace isolation matters if the kernel itself can be tricked into overwriting its own security data structures.

CVE-2019-5736: The runc Overwrite

πŸ”” Never Miss a Breaking Change

Monthly release roundup β€” breaking changes, security patches, and upgrade guides across your stack.

βœ… You're in! Check your inbox for confirmation.

What Happened

Disclosed on February 11, 2019, CVE-2019-5736 was arguably the most impactful container escape vulnerability ever published. It affected runc, the low-level container runtime used by Docker, containerd, CRI-O, and essentially every OCI-compliant container platform. The vulnerability allowed a malicious process inside a container to overwrite the host’s runc binary, gaining root-level code execution on the host.

How the Exploit Worked

The exploit took advantage of how Linux handles /proc/self/exe. This special file is a symbolic link that points to the binary of the currently running process. When runc executes a command inside a container (via docker exec or similar), there is a brief window where the container’s process can access the runc binary through /proc/self/exe.

The attack worked in two stages:

  1. Set the trap. The attacker replaces the container’s /bin/sh (or another entrypoint binary) with a script containing #!/proc/self/exe. This tells the kernel to execute the binary that /proc/self/exe points to — which, during a docker exec call, is the host’s runc binary.
  2. Overwrite runc. When runc enters the container and the tampered entrypoint executes, the process gets a file handle to the host’s runc binary via /proc/self/exe. The attacker then writes a malicious payload to this file handle, overwriting the host’s runc binary with attacker-controlled code.

The next time any container operation invokes runc on that host — starting a container, running exec, or even performing a health check — the attacker’s payload executes with root privileges on the host.

Impact and Fix

The severity was enormous. The exploit required only UID 0 inside the container (which is the default for most container images) and worked with default Docker configurations. No special privileges, no host mounts, no unusual capabilities. It affected Docker, Kubernetes, and any platform using runc versions prior to 1.0-rc6.

The fix changed runc’s behavior so that it creates a copy of itself as a sealed, read-only file descriptor (using memfd_create with F_SEAL flags) before entering the container. When the malicious process attempts to write to /proc/self/exe, the kernel blocks the write because the file descriptor is sealed. Additionally, running containers as a non-root user, enabling SELinux in enforcing mode, or setting the runc binary to read-only on the host filesystem all served as mitigations.

CVE-2019-1002101: kubectl cp Directory Traversal

What Happened

While most container escape CVEs involve breaking out of a running container, CVE-2019-1002101 took a different approach: it targeted the operator’s workstation. Discovered by researchers at Twistlock (now part of Palo Alto Networks) and disclosed in March 2019, this vulnerability allowed a malicious container to write arbitrary files to the machine of any Kubernetes user who ran kubectl cp to copy files from that container.

How the Exploit Worked

The kubectl cp command works by creating a tar archive inside the target container, streaming it over the network to the user’s machine, and extracting it locally. The vulnerability was a classic directory traversal: the tar archive created inside the container could include file paths containing ../ sequences, and kubectl did not sanitize these paths before extraction.

If an attacker controlled the tar binary inside a container (or could otherwise influence the tar output), they could craft filenames like ../../../etc/cron.d/backdoor. When the unsuspecting operator ran kubectl cp mypod:/data ./local-dir, the malicious tar entries would be extracted outside the intended destination directory, writing files anywhere the user had permissions.

This was particularly dangerous because:

  • It required no elevated privileges in the container. The attacker only needed to control or replace the tar binary.
  • It targeted the operator’s machine, which often has credentials for the entire Kubernetes cluster — kubeconfig files, cloud provider credentials, SSH keys.
  • It was a client-side vulnerability, meaning the cluster itself could be fully patched and still be exploitable through an unpatched kubectl binary on a developer’s laptop.

Impact and Fix

The fix in Kubernetes 1.11.9, 1.12.7, 1.13.5, and 1.14.0 added path validation to reject directory traversal sequences during tar extraction. The initial fix was incomplete — follow-up CVEs (CVE-2019-11246 and CVE-2019-11249) addressed bypass techniques, highlighting how tricky path sanitization can be.

This vulnerability is a reminder that the attack surface of a Kubernetes environment extends beyond the cluster. Operator tools, CI/CD pipelines, and client-side utilities are all part of the security perimeter.

CVE-2020-15257: containerd Host Network Escape

What Happened

In November 2020, NCC Group disclosed CVE-2020-15257, a vulnerability in containerd that allowed containers running with host network access to escape to the host. The vulnerability affected containerd versions 1.2.x, 1.3.x, and 1.4.x (prior to 1.3.9 and 1.4.3).

How the Exploit Worked

containerd uses a component called containerd-shim, which runs as a parent process for each container and manages its lifecycle. The shim exposes an API over an abstract namespace Unix domain socket. The critical flaw was that this socket was accessible from the host’s network namespace.

When a container was configured with --net=host (sharing the host’s network namespace), a root process inside that container could connect to the containerd-shim’s abstract Unix socket. From there, the attacker could use the shim API to:

  • Read and write files on the host filesystem.
  • Execute commands on the host as root.
  • Spin up new, fully privileged containers — effectively creating a root shell on the host.

The attack required two conditions: the container had to be running with host networking (hostNetwork: true in Kubernetes, or --net=host in Docker), and the process inside had to be running as UID 0.

Impact and Fix

The fix in containerd 1.3.9 and 1.4.3 switched the shim API from abstract Unix sockets to file-based Unix sockets under /run/containerd, which respect filesystem permissions and namespace boundaries. Important operational detail: containers running before the upgrade retained the old socket connections and had to be restarted for the fix to take effect.

CVE-2020-15257 reinforced a well-known principle: do not use host networking unless absolutely necessary. This CVE showed that the consequences extend beyond network-level access to full container escape.

CVE-2024-21626: Leaky Vessels

What Happened

In January 2024, Snyk researchers disclosed a set of vulnerabilities collectively named “Leaky Vessels,” with CVE-2024-21626 being the most severe. This was another runc vulnerability — five years after CVE-2019-5736, the same component had a critical container escape bug. It affected all runc versions through 1.1.11 and carried a CVSS score of 8.6.

How the Exploit Worked

The vulnerability stemmed from an internal file descriptor leak in runc. When runc set up a new container, it inadvertently leaked file descriptors that pointed to the host filesystem. An attacker could exploit this by manipulating the working directory (process.cwd) of a container process to reference one of these leaked file descriptors.

There were two primary attack vectors:

  1. Malicious container image. A Dockerfile with a WORKDIR directive set to a path like /proc/self/fd/[leaked_fd] could cause the container process to start with its working directory pointing to a host filesystem location. This meant anyone who built or ran the malicious image could be compromised.
  2. Crafted exec command. An attacker with the ability to run runc exec (or docker exec) could specify a working directory that referenced the leaked file descriptor, gaining read and write access to the host filesystem.

What made this vulnerability especially concerning was the image-based attack vector. Unlike CVE-2019-5736, which required an attacker to already have code execution inside a container, CVE-2024-21626 could be triggered simply by building or running a malicious image pulled from a registry.

Impact and Fix

The fix in runc 1.1.12 ensured that all internal file descriptors are properly closed before the container process starts, eliminating the leak. Major Linux distributions (Debian, Ubuntu, Red Hat, Amazon Linux, Alpine) published advisories and patched packages within days of disclosure.

The Leaky Vessels disclosure also included three other CVEs affecting Docker’s BuildKit component (CVE-2024-23651, CVE-2024-23652, and CVE-2024-23653), targeting race conditions and privilege escalation during image builds. Together, they demonstrated that the container build pipeline — not just runtime — is a significant attack surface.

Other Notable Container Escape Vulnerabilities

The five CVEs above represent the most impactful container escapes, but they are not the only ones. Several other vulnerabilities have shaped how the industry thinks about container security.

Dirty COW (CVE-2016-5195)

Dirty COW was a race condition in the Linux kernel’s memory subsystem, present for nine years before its discovery in October 2016. The vulnerability allowed an unprivileged process to write to read-only memory mappings. While it was a general Linux privilege escalation bug rather than a container-specific vulnerability, researchers quickly demonstrated container escape techniques using it.

The exploit leveraged the vDSO (virtual Dynamic Shared Object), a shared memory region mapped into every process that contains frequently called kernel functions like clock_gettime(). By using Dirty COW to overwrite code in the vDSO, an attacker inside a container could inject shellcode that would execute in the context of any process on the host that called the modified function — including host processes outside the container.

Dirty COW remains significant because it demonstrated that kernel memory corruption bugs can bypass all container isolation mechanisms simultaneously. No amount of namespace, cgroup, or capability configuration can protect against a bug that lets you write to arbitrary kernel memory.

systemd-journald Exploits (CVE-2018-16865 and CVE-2018-16866)

In January 2019, Qualys disclosed a set of vulnerabilities in systemd-journald, the logging daemon present on virtually all systemd-based Linux distributions. CVE-2018-16865 was a stack-based memory corruption caused by an attacker-controlled alloca() call, and CVE-2018-16866 was an out-of-bounds read that leaked heap memory contents.

Chained together, these vulnerabilities allowed a local attacker to obtain a root shell in as little as 10 minutes on i386 systems (approximately 70 minutes on amd64). Since journald runs as root and accepts log messages from containers, this created a path from containerized process to host root access through the logging infrastructure.

These bugs highlighted the risk of host services that accept input from containers. Any host daemon that processes container-generated data — logging agents, monitoring sidecars, metrics collectors — is a potential escape vector if it has vulnerabilities.

CoreOS rkt Vulnerabilities

CoreOS rkt, once a significant Docker alternative, had multiple CVEs in its isolation mechanisms — flaws in filesystem isolation and namespace setup that allowed containers to access host resources. Though rkt has since been archived, these bugs reinforced that container escape vulnerabilities are not specific to any single runtime. Every container runtime needs rigorous security auditing.

Patterns Across Container Escape CVEs

Several recurring patterns emerge across these vulnerabilities:

  • Shared kernel, shared fate. CVE-2017-5123 and Dirty COW exploited kernel bugs that no amount of namespace isolation can defend against. This is the fundamental architectural limitation of containers versus virtual machines.
  • File descriptor and /proc leaks. CVE-2019-5736 and CVE-2024-21626 both exploited how runc handles file descriptors and /proc entries during container setup. Any leak between host and container namespaces is a potential escape vector.
  • Host services extend the attack surface. CVE-2020-15257 and the systemd-journald exploits show that any host service that accepts container input — runtime APIs, logging daemons, monitoring agents — is a potential escape path.
  • Client tools matter too. CVE-2019-1002101 weaponized kubectl to compromise operator workstations. The security perimeter includes every tool that interacts with the cluster.

Modern Defenses Against Container Escapes

The container ecosystem has developed multiple layers of defense in response to these vulnerabilities. No single mechanism is sufficient — effective container security requires defense in depth.

Seccomp Profiles

Seccomp (Secure Computing Mode) restricts which system calls a containerized process can make. Docker’s default profile blocks approximately 44 of the 300+ available system calls, including dangerous operations like mount, reboot, and keyctl. Custom profiles tailored to your application’s actual system call usage offer stronger protection — tools like strace or eBPF-based profilers can record normal behavior and generate a tight allowlist. In recent Kubernetes versions, seccomp profiles are applied via the seccompProfile field in the pod security context.

AppArmor and SELinux

Mandatory Access Control (MAC) systems add restrictions beyond standard Linux permissions. SELinux (Red Hat, CentOS, Fedora) enforces type-based access controls that can prevent container processes from accessing host files or binaries even with root capabilities — notably, SELinux in enforcing mode mitigated CVE-2019-5736 by blocking writes to the host’s runc binary. AppArmor (Ubuntu, Debian) provides path-based controls; Docker applies a default AppArmor profile to all containers.

Rootless Containers and User Namespaces

Many container escape exploits require UID 0 inside the container (CVE-2019-5736, CVE-2020-15257) or gain impact by escalating to host root. Rootless containers address this by running the entire container runtime as an unprivileged user, using user namespaces to remap UID 0 inside the container to an unprivileged UID on the host.

With rootless mode, even a successful escape lands the attacker on the host as an unprivileged user. Docker supports rootless mode natively (since 20.10), Podman runs rootless by default, and Kubernetes user namespaces for pods reached beta in version 1.30. In Docker, enable remapping in the daemon configuration:

{
  "userns-remap": "default"
}

Read-Only Root Filesystems

Running containers with read-only root filesystems (--read-only in Docker, or readOnlyRootFilesystem: true in Kubernetes security context) prevents a compromised container from modifying its own filesystem. This directly mitigates exploits like CVE-2019-5736, where the attacker needed to replace binaries inside the container to set up the exploit chain. Applications that need writable directories can use tmpfs mounts for specific paths.

Runtime Security: Falco and Tetragon

Static configurations define what a container should do. Runtime security tools detect when a container does something unexpected — often the first sign of an escape attempt.

Falco, a CNCF graduated project, monitors system calls and container events against a rule engine. It detects suspicious activities like a process opening /proc/self/exe for writing (CVE-2019-5736), unexpected Unix socket connections (CVE-2020-15257), or processes spawning that do not match the expected entrypoint.

Tetragon, from the Cilium project, uses eBPF to enforce security policies directly in the kernel. Unlike Falco, which primarily detects and alerts, Tetragon can terminate malicious processes at the kernel level before an exploit completes — with less than 1% performance overhead. Many security teams deploy both: Falco for broad detection, Tetragon for real-time enforcement.

Pod Security Standards

Kubernetes Pod Security Standards (replacing the deprecated PodSecurityPolicy) define three profiles — Privileged, Baseline, and Restricted. The Restricted profile enforces non-root execution, drops all capabilities, disables privilege escalation, and requires a read-only root filesystem. Applying it broadly eliminates the conditions required by most container escape exploits.

Image Scanning and Supply Chain Security

CVE-2024-21626 showed that a malicious image could trigger an escape simply by being built or run. Image scanning tools (Trivy, Grype, Snyk Container) detect known vulnerable packages, image signing with Sigstore/cosign provides provenance verification, and admission controllers (Kyverno, OPA Gatekeeper) can enforce that only signed, scanned images from trusted registries are deployed.

Container Escape Prevention Checklist

The following checklist consolidates the lessons from every major container escape CVE into actionable steps. Each item addresses a specific class of vulnerability covered in this article.

Runtime Configuration

  • Run containers as non-root. Set runAsNonRoot: true and specify a runAsUser in your pod security context. This mitigates CVE-2019-5736, CVE-2020-15257, and reduces the impact of most other escapes.
  • Drop all capabilities, add only what is needed. Use drop: ["ALL"] in the container’s securityContext.capabilities and explicitly add back only the capabilities your application requires.
  • Disable privilege escalation. Set allowPrivilegeEscalation: false to prevent processes from gaining additional privileges via setuid binaries or other mechanisms.
  • Use read-only root filesystems. Set readOnlyRootFilesystem: true and mount writable tmpfs volumes only where needed.
  • Avoid host namespaces. Do not use hostNetwork, hostPID, or hostIPC unless there is a documented, unavoidable requirement. CVE-2020-15257 was only exploitable with host networking.
  • Never run privileged containers in production. The --privileged flag disables virtually all container isolation mechanisms.

Infrastructure and Patching

  • Keep the host kernel updated. Kernel vulnerabilities (CVE-2017-5123, Dirty COW) bypass all container isolation. Automated kernel updates with live patching (kpatch, livepatch) minimize downtime.
  • Patch container runtimes promptly. runc, containerd, and CRI-O vulnerabilities (CVE-2019-5736, CVE-2020-15257, CVE-2024-21626) are direct escape vectors. Subscribe to security mailing lists and automate updates.
  • Update client tools. kubectl and other client-side tools (CVE-2019-1002101) are part of the attack surface. Include them in your patch management process.
  • Enable user namespaces. Configure rootless container runtimes or enable user namespace remapping to ensure UID 0 inside containers maps to an unprivileged host UID.

Detection and Monitoring

  • Deploy runtime security tooling. Use Falco, Tetragon, or similar tools to detect anomalous container behavior — unexpected process execution, suspicious file access, unusual system calls.
  • Apply seccomp profiles. Start with the default Docker/containerd seccomp profile and customize based on your application’s actual system call requirements.
  • Enable audit logging. Kubernetes audit logs, container runtime logs, and host-level audit (auditd) provide the forensic trail needed to investigate potential escapes.
  • Monitor for leaked file descriptors. Unexpected entries in /proc/[pid]/fd pointing outside the container’s filesystem may indicate a vulnerability like CVE-2024-21626.

Supply Chain

  • Scan images for known CVEs. Run vulnerability scanners (Trivy, Grype, Snyk) in your CI/CD pipeline and block deployment of images with critical vulnerabilities.
  • Use minimal base images. Smaller images (distroless, Alpine, scratch) have fewer packages and fewer potential vulnerabilities. They also lack tools (like a full tar binary) that some exploits depend on.
  • Sign and verify images. Use cosign/Sigstore for image signing and enforce signature verification in your admission controller to prevent running untrusted images.
  • Pin image digests. Reference images by digest (image@sha256:...) rather than mutable tags to prevent tag-based supply chain attacks.

The Future of Container Isolation

These CVEs span seven years, and the fundamental challenge has not changed: containers share a kernel with the host, and that shared kernel is a shared attack surface. The industry is pursuing several approaches to strengthen isolation.

Sandbox runtimes like gVisor (user-space kernel that intercepts system calls) and Kata Containers (lightweight VM per container) add stronger isolation boundaries with performance trade-offs that are increasingly acceptable for security-sensitive workloads.

eBPF-based security enforcement is maturing rapidly. As eBPF gains capabilities in newer kernel versions, tools like Tetragon will offer increasingly granular runtime enforcement with minimal overhead.

Confidential computing (AMD SEV, Intel TDX) is bringing hardware-level isolation to container workloads using encrypted memory enclaves that even a compromised host kernel cannot read — potentially providing VM-level guarantees for containers.

For most teams today, defense in depth — rootless containers, seccomp profiles, MAC policies, runtime security tools, and diligent patching — provides strong protection. No single mechanism is a silver bullet, but the combination makes exploitation significantly harder and detection significantly faster.

Container escapes are not theoretical. They have been discovered repeatedly in the most critical infrastructure components, from the Linux kernel to runc to containerd to kubectl. The organizations that avoid becoming case studies are the ones that treat these vulnerabilities as inevitable, and build their defenses accordingly.

πŸ› οΈ Interactive Tool

Browse CVE data for your stack

Open in new tab β†—

πŸ› οΈ Try These Free Tools

⚠️ K8s Manifest Deprecation Checker

Paste your Kubernetes YAML to detect deprecated APIs before upgrading.

🐳 Dockerfile Security Linter

Paste a Dockerfile for instant security and best-practice analysis.

πŸ—ΊοΈ Upgrade Path Planner

Plan your upgrade path with breaking change warnings and step-by-step guidance.

See all free tools β†’

πŸ””

Stay ahead of breaking changes

Free email alerts for EOL dates, CVEs, and major releases across your stack.

Get Alerts →