Virtualization and containers are complimentary efforts which seek to allow different tasks to be isolated from each other on the same host system. In virtualization, the guest systems appear to be running on their own hardware; each guest system runs its own kernel. The container approach runs all guests under the same kernel in a way that isolates them from each other. Virtualization gives more complete isolation and allows guests to be running a different operating system than the host; containers, however, tend to be far more efficient.
Xen is a well-funded paravirtualization effort; it has been shipped in various forms by some distributors. It allows any operating system to run as a guest, but the guest kernel must be built specially to run under Xen.
Forecast: Xen has had a very slow path into the mainline kernel. The Xen developers have not always worked well with the kernel community, and have not always been interested in keeping their code current with current kernels. That notwithstanding, the core DomU (unprivileged guest) Xen support patches were merged for 2.6.23. Work continues on Xen, with the "balloon driver" (allowing the host to add memory to and remove memory from running guests) was merged for 2.6.26.
The remaining sticking point is the Dom0 (supervisor domain) code, which remains out of the 2.6.30 kernel with little prospect of being merged in the near future. There are some fundamental disagreements over how the Dom0 code fits into the core x86 architecture support that look to take a while, yet, to work out.
For more information:
- Xen is coming (November, 2004)
- Toward the merging of Xen (March, 2005)
- The Xen patches (May, 2006)
- Connecting Linux to hypervisors (paravirt_ops; August, 2006)
- Xen: finishing the job (March, 2009)
- Xen again (June, 2009)
Kernel Virtual Machine (KVM)
KVM emerged out of the blue in October, 2006; it is an effort funded by Qumranet, a virtualization company still operating in "stealth mode." KVM is a full virtualization solution which is dependent on hardware support; as a result, it is relatively efficient but only works with certain recent processors. This code was merged for the 2.6.20 kernel.
Forecast: KVM is already in the mainline and has been generally well received. Certain aspects of the code - including the user-space API - are still stabilizing, though the worst should be past. SMP guest support for KVM was added for 2.6.23.
For more information:
- An early KVM patch summary (October, 2006)
- Some KVM developments (January, 2007)
- KVM 15 (February, 2007)
Lguest is a simple virtualization project developed by Rusty Russell of IBM. It provides full virtualization on stock x86 hardware. The plan is to keep this code relatively simple, so it is unlikely to develop some of the features (live migration, for example) found in other virtualization solutions.
Forecast: Lguest was merged for 2.6.23. The "virtio" used to provide virtualized network and disk devices for lguest guests, was merged for 2.6.24, with enhancements merged for 2.6.25. In the longer term, expect to see lguest ported to architectures beyond i386.
For more information: An introduction to lguest (January, 2007)
There are several container-oriented projects all hoping to have their support code included in the kernel in the near future. The blocking point in most cases is the same: the kernel developers have no interest in merging separate support structures for each container implementation. So the container developers have been told to come up with a common infrastructure which works for all of them.
The needs of the container projects are about the same in each case: every globally-visible resource in the kernel must be moved behind a layer of indirection so that each container can have its own view. These resources include filesystems, process IDs, devices, network interfaces, interprocess communication mechanisms, even the current system time. In addition, most of these projects need some sort of resource control mechanism which can be used to keep containers from interfering (too much) with each other.
There is a core piece of infrastructure called "process containers" which is intended to tie together all container-related technologies in the kernel. Process containers are also used for the CFS group scheduling feature.
Forecast: The process of reworking various bits of container infrastructure and merging them into the mainline continues. As of 2.6.27, the network namespace work will be close to complete. Some pieces which remain outside include complete user namespaces, better sysfs support, and a mechanism for checkpointing and restoring running containers.
For more information:
- Containers and PID virtualization (January, 2006)
- Virtual time (April, 2006)
- Containers and lightweight virtualization (April, 2006)
- Paravirtualization and containers (Kernel summit report, July, 2006)
- Resource beancounters (August, 2006)
- Another container implementation (Resource control, September, 2006)
- Network namespaces (January, 2007)
- Process containers (May, 2007)
- Kernel Summit 2007 session on containers (September, 2007)
- Notes from a container (October, 2007)
- Process IDs in a multi-namespace world (November, 2007)
- Kernel-based checkpoint and restart (August, 2008)
- Sysfs and namespaces (August, 2008)
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.