Summary of "Containers Don't Exist - Your Kernel Is Lying to You"
Thesis
Containers are ordinary Linux processes that the kernel configures to have restricted views and resources — not “small VMs.” VMs virtualize hardware and run separate kernels; containers share the host kernel and use kernel primitives to isolate processes.
Key technical concepts (what makes a container)
Namespaces (visibility isolation)
- PID namespace
- Processes see a separate process tree (PID 1 inside the container vs a different PID on the host).
- Mount namespace
- A container sees a different root filesystem (image layers become mounts).
- Network namespace
- Containers get independent IPs/ports (multiple containers can bind the same port without conflict).
- User namespace
- Remaps UIDs so “root” in a container can be an unprivileged host UID.
cgroups (resource control)
- Control limits for CPU, memory, IO, etc., so one container can’t starve others.
The shared kernel
- Containers share the host kernel; this shared kernel is the attack surface that differentiates containers from VMs.
Commands / demonstrations referenced
- docker run, docker exec
- docker exec uses the setns syscall to join namespaces — it starts another process with the same namespace assignments, not “entering a box.”
- ps aux
- See container processes on the host (they appear as normal processes).
- /proc//ns
- Shows symlinks to the namespaces attached to a process.
- unshare
- Create a process with its own namespaces (demonstrates container primitives).
- setns
- Attach a process to existing namespaces (how docker exec works).
Security and isolation tradeoffs
VMs
- Stronger isolation because each VM has its own kernel; escaping a VM requires hypervisor/kernel bugs.
- Modern CPU virtualization (VT-x / AMD-V) reduces CPU overhead.
- Main costs: memory, boot time, and IO overhead.
Containers
- Faster startup and lower resource overhead but weaker isolation due to the shared kernel.
- Real exploits have existed (for example CVE-2019-5736 in runc) demonstrating that container escapes are practical.
Mitigations / stronger isolation options
- MicroVMs (e.g., Firecracker) — smaller VMs for per-customer/function isolation.
- User-space kernel implementations (e.g., gVisor) — intercept syscalls in user space to reduce exposure.
- Run containers inside VMs: use VMs for infrastructure boundaries and containers for application packaging.
Products, runtimes and platforms mentioned
- Docker (engine: docker run, docker exec)
- runc (low-level runtime; notable CVE reference)
- Kubernetes (orchestrator; uses runtimes like runc)
- containerd, Podman (other runtimes built on the same kernel primitives)
- Docker Desktop (macOS/Windows: creates a lightweight Linux VM because those hosts lack Linux namespaces — leads to slower host-volume mounts and networking quirks)
- Firecracker (microVM used by AWS Lambda and similar platforms)
- gVisor (Google Cloud approach that reimplements application kernel behavior in user space)
Practical guidance / takeaways
- Don’t call containers “tiny VMs.” Understand they’re processes using namespaces + cgroups.
- Expect differences between local Docker Desktop environments (which run inside a Linux VM on macOS/Windows) and Linux production environments, especially for filesystem and networking.
- For multi-tenant or untrusted workloads, prefer stronger isolation (microVMs, gVisor) or run containers inside VMs.
- You can inspect namespace attachments directly on Linux via /proc//ns and reproduce basic container behavior using unshare and setns.
Type of video
- Explanatory/technical tutorial with a short demo showing how namespaces and cgroups work and what docker exec actually does.
Main speaker / sources (as referenced)
- Single unnamed presenter / video narrator
- Projects / companies cited: Docker, runc, Kubernetes, containerd, Podman, Docker Desktop, AWS Lambda, Firecracker, Fly.io, Google Cloud, gVisor
- Underlying technologies: Linux kernel (namespaces, cgroups), CPU virtualization (VT-x / AMD-V)
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...