Summary of "Контейнерная виртуализация в Linux"
Container virtualization in Linux — Summary
This document summarizes core concepts, practical details, commands and references for Linux containerization built on namespaces and cgroups.
Core concepts
- Containers are implemented using two kernel facilities:
- Namespaces — isolate kernel-visible resources (PID, mount/fs view, UTS/hostname, IPC, network, user, etc.). Each namespace type controls what a process can see and interact with.
- Control groups (cgroups) — group processes to account for, limit and control usage of resources (CPU, memory, blkio, devices, freezer, cpusets, net_cls, etc.). Controllers (subsystems) implement limits/accounting for particular resources; cgroupfs exposes a file-like interface to manage them.
Namespaces — details and behavior
Main namespace types discussed: mount, PID, UTS (hostname), IPC, network, user.
-
Creation APIs
clone(2)andunshare(2)with flags (CLONE_NEW*) create namespaces.fork/clonesemantics follow process/task structures in the kernel.
-
Mount namespace
- Provides a process its own view of the filesystem tree.
- Mount propagation modes control how mount/unmount events propagate:
shared,slave,private,unbindable. - Bind mounts can remap parts of a filesystem inside a namespace.
-
PID namespaces
- Nested PID namespaces are possible. Processes see PIDs relative to their PID namespace.
- Parent-child relationships still exist; inside a namespace a process sees the local PID view.
- A proper
/procview is required (mountingprocfsinside the namespace) so tools likepsandtopbehave sensibly.
-
UTS namespace
- Lets a namespace have its own hostname (useful for containers).
-
User namespaces
- Allow mapping user/group IDs inside the namespace; a process can appear to be root (UID 0) inside a user namespace while being unprivileged outside.
- Capabilities and mappings must be configured (
uid_map/gid_map); otherwise some capabilities may not be effective.
-
Network namespace
- Isolates network interfaces, routes, iptables rules, etc.
- Utilities like
ipandip netnssupport operating inside a given network namespace.
Saving and reusing namespaces
- Running namespaces are referenced via files under
/proc/<pid>/ns/*. - Bind-mounting these namespace files (for example,
mount --bind /proc/<pid>/ns/uts /some/point) keeps the namespace alive after the original process exits. - Other processes can join a namespace using
setns(2)with an open FD to such a file. - CRIU (Checkpoint/Restore In Userspace) is a project to save and restore process state (checkpoint/restore).
Cgroups — structure and controllers
- cgroupfs (or newer unified interfaces) provide a filesystem-like interface to:
- create hierarchies,
- assign controllers,
- set parameters,
- move tasks between cgroups.
Terminology:
- Controller / subsystem — implements resource control (e.g.,
cpu,memory,blkio,devices,freezer,cpuset). - cgroup — a node in a controller hierarchy with its own parameter files and a
taskslist.
Common controllers:
cpu/cpuacct— CPU resource control and accounting; uses relative shares. Shares are relative weights; limits apply under contention.cpuset— bind groups to specific CPUs and memory nodes.memory— set memory limits. You cannot forcibly take memory away from a running process; ideal approach is to start the process in the intended cgroup or have the app handle reduction. Swap behavior and OOM handling are important to configure.blkio— I/O throttling/weights per block device.devices— whitelist/blacklist access to device nodes (major/minor numbers).freezer— freeze/thaw groups (useful to pause/resume sets of tasks).net_cls/net_prio— tag and classify network packets by cgroup.
Managing and lifecycle notes:
- You can move processes into a cgroup by writing the PID into the
tasksfile. - Deleting a cgroup causes its tasks to move to the parent cgroup.
- The unified v2 cgroup interface changes layout/semantics; tools vary between v1 and v2.
Managing cgroups — tools and practices
Ways to manage cgroups:
- Manual file writes:
- Example:
echo <pid> > /sys/fs/cgroup/<controller>/<cgroup>/tasks
- Example:
- Userspace utilities:
cgcreate,cgexec,cgclassify- Higher-level libraries and orchestration systems (Docker, systemd, Kubernetes) manage cgroups for you.
Practical tips:
- Prefer launching processes directly in the intended cgroup (
cgexec) rather than moving a heavy process after the fact — you cannot reclaim already-allocated memory transparently. - CPU shares are relative; guaranteed CPU is only meaningful under contention. For strict CPU caps use quota mechanisms.
- Use
freezerto pause/resume groups simply and reliably.
Containers vs Virtual Machines
-
Virtual Machines (VMs)
- Virtualize hardware and run a full guest OS kernel.
- Heavier, slower to start/stop, complete kernel isolation.
-
Containers
- Share the host kernel; isolate via namespaces + cgroups.
- Much lighter and faster (process-level startup), easier to snapshot/transport userland.
- Limitation: containers cannot run a different kernel (e.g., Windows kernel on Linux) because they share the host kernel — containers virtualize userspace only.
Tools, runtimes and Docker
Low-level primitives:
unshare,setns,clone,mount,procfs, bind mounts,ip/network tools.
Implementations / runtimes:
- LXC (Linux Containers)
- libcontainer / runc
- libvirt (for VMs)
- Other container runtimes and orchestration systems
Docker concepts and common commands:
- Image vs container
- Image — on-disk filesystem layers (immutable templates)
- Container — running instance (writable layer over an image)
- Overlay / union filesystems — images are layered; writable layer holds diffs.
Common Docker commands:
docker pull— fetch image from registrydocker run— create and start a new container (runs a command)docker start— start an existing containerdocker commit— create an image from a container’s current statedocker build+Dockerfile— reproducibly build imagesdocker push— upload image to a registrydocker inspect— view metadata (JSON)docker diff— show filesystem changes in a container vs its image
Docker Hub / other registries provide image storage and sharing. Docker benefits include portability, rapid start/stop, reproducible environments for development and CI. Caveats include persistent data handling (volumes), port/network mapping, and security implications from sharing the host kernel.
Practical examples and useful commands
- Bind mount a path:
- mount –bind source target
- Mount propagation flags:
- shared / slave / private / unbindable
- Keep a namespace present by bind-mounting:
- mount –bind /proc//ns/ /path
- Create namespaces with system calls:
unshare(2)/clone(2)withCLONE_NEWUTS,CLONE_NEWPID,CLONE_NEWNS,CLONE_NEWUSER,CLONE_NEWNET, etc.
- Join a namespace:
setns(2)on an open fd to/proc/.../ns/...
- Move a process to a cgroup:
echo <pid> > /sys/fs/cgroup/<controller>/<cgroup>/tasks
- Userspace cgroup utilities:
cgcreate,cgexec,cgclassify
- Common Docker commands:
docker run,docker start,docker commit,docker build,docker pull,docker push,docker inspect,docker diff
Limitations and practical notes
- Namespaces persist only while they are referenced; bind-mounting namespace files keeps them alive.
- You cannot grant more privileges to a running application than permitted by the host without proper UID/GID mappings and capabilities. User namespaces are powerful but require
uid_map/gid_mapsetup. - You cannot reliably forcefully reclaim memory already allocated by a running process; start the process in the desired cgroup or have the application handle reductions.
- CPU shares are relative; they do not provide fixed CPU caps unless quota mechanisms are used.
- Containers are lightweight but limited to the host kernel; when a different kernel is required, virtualization (VMs) is necessary.
Guides, tutorials and references
- Red Hat system resource management guide — practical admin guide on cgroups and namespaces.
- kernel.org documentation — authoritative kernel docs for namespaces, cgroups and controllers.
- Docker “Get Started” tutorial (docker.com) — hands-on tutorial.
- Michael Kerrisk — articles and examples (author of The Linux Programming Interface).
- James Turnbull — Docker: Up & Running (recommended book).
- CRIU project — Checkpoint/Restore In Userspace.
- Various blog posts and translations (including Russian-language writeups) for practical examples and explanations.
Sources and speakers cited
- Presenter (unnamed lecturer)
- Michael Kerrisk
- Red Hat developers / documentation
- kernel.org documentation
- Docker official docs and tutorials
- James Turnbull
- CRIU project
If you need a short cheat-sheet of commands and files to inspect for each namespace and cgroup controller, that can be produced separately.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.