When we talk about virtualization solutions, we tend to talk about specific products offered by specific companies. But when we talk about virtualization solutions with Linux, we instead talk about a rich and diverse open source ecosystem.
Linux supports a variety of virtualization platforms (covering the spectrum of techniques) as well as the elements that create a complete solution. In this article, we'll start with a short discussion of virtualization and then dig down into the rich set of virtualization techniques that you'll find with Linux.
There are many types of virtualization, but in this article we'll focus primarily on what's known as hardware virtualization. This type of virtualization abstracts the software environment from the underlying hardware platform. We'll also explore operating system virtualization, which is another area where Linux shines.
In the context of hardware virtualization, the software abstraction that separates the operating system from the hardware is called the hypervisor (or virtual machine monitor). The hypervisor creates a virtual platform onto which operating system instances may be executed. This allows the hardware platform to be shared by multiple operating systems and application sets, making it more cost effective (among other things we'll explore shortly).
The entity that sits on the hypervisor is called a virtual machine (or VM, and collectively encapsulates the OS, applications, and metadata to define the VM's constraints). The VM represents the OS and applications as a virtual disk that is used by the hypervisor to enable the virtual platform (boot, root, and other disks). The VM is typically packaged as a file, making it simpler to manage than a distributed set of files.
While virtualization is all the rage today, it's interesting to note that it was implemented half a century ago in IBM mainframes. Virtualization was coined in the experimental IBM M44 computer system and then popularized in the IBM System/360 line of mainframes. The first true hypervisor that provided full virtualization of the entire hardware platform was the IBM CP-40 system, and was used commercially in the late 1960s.
Early hypervisors implemented what is now called Type 1, or native virtualization. In this model, the hypervisor executed directly on the bare-metal hardware and virtual-machines executed on top of this. Subsequently, the hosted model of virtualization was developed. With this model, the hypervisor ran in the context of another operating system (which runs on the bare-metal) but allowed two or more operating systems to coexist on the same platform.
Figure 1: Native vs. Hosted Virtualization Environments.
With some of the basics of virtualization under our belts, let's now look at how Linux implements virtualization across the spectrum of useful styles (including emulation, full and para-virtualization, operating system virtualization, and others).
Emulation is the process of transforming the services of one system (the host) and making them appear as another system (the guest). What's novel about this particular style of virtualization is that the host and guest systems need not be the same. For example, the host could be an x86 platform but provide emulation for a PowerPC-based platform (different platforms and instruction set architectures).
Additionally, an emulator could provide emulation services for multiple platforms. We'll look at one example of this in the discussion of solutions. While an emulator need not operate in a type-2 fashion (in the context of host operating system, as shown in Figure 2), this is a common use model.
Figure 2: Emulation as a Virtualization Scheme.
Two of the most interesting emulators used in Linux include QEMU and Bochs (both processor and platform emulators). The upside of these solutions is that they are portable and can run a variety of guest operating systems on a variety of host operating systems and platforms.
The downside to this approach is that it tends to be less efficient because of instructions being emulated. QEMU takes a novel approach to emulation through dynamic binary translation and a mode where kernel and native user code can be accelerated. Further, QEMU is a great development tool for embedded platforms, allowing you to develop and test code for a processor different than the host. QEMU is also used by other virtualization solutions for device emulation (for example, KVM takes advantage of its services as a user-space device emulator).
The more traditional virtualization solutions are of the platform virtualization variety. These tend to present a guest architecture that is similar or the same as the host architecture. Platform (or hardware) virtualization comes in two primary forms: full and para-virtualization.
Full virtualization (as shown in Figure 3) presents a virtual platform through the hypervisor to guest VMs. The key behind full virtualization is that those VMs (namely, the operating systems within the VM) can run unmodified on the hypervisor. This is ideal when a true virtual platform is required, but has a downside.
The virtual platform being presented to the VM mimics that of the physical platform. Guest OS drivers work with the platform as they commonly do on actual hardware (through the platform's physical interfaces, such as the PCI bus).
But consider what this means. The guest OS communicates with the virtual platform assuming that it's the real platform. Within the hypervisor, another layer of emulation exists to translate those register-level hardware accesses into pseudo calls for the real hardware. So the hypervisor emulates the hardware platform and then translates those requests to the physical hardware, which requires a considerable amount of processing and limits the performance of guest I/O.
One answer to this problem is to make the guest operating system aware that it is being virtualized (called para-virtualization, See the right side of Figure 3). In this case, the guest includes drivers which short-circuit hardware requests, which are passed down to the hypervisor for actual processing. For example, instead of the guest OS making PCI-level requests, it instead makes a request for a considerably higher-level operation (such as send an Ethernet frame on a particular NIC). This removes the guest from the additional (unnecessary) work, and allows it to instead focus on its higher level tasks.
Figure 3: Full and Para-Virtualization.
So while full virtualization is an ideal scheme from a purist perspective, considerable performance gains can be attained by modifying the guest operating system and minimizing the processing overhead.
Linux includes two important solutions that implement both full virtualization and para-virtualization. Xen (developed by Citrix) can perform as both a type-1 and type-2 hypervisor, and is a popular solution. For example, Amazon's elastic compute cloud (EC2) relies on Xen for virtualization of individual servers.
Another important hypervisor here is the Linux Kernel Virtual Machine (or KVM), which also supports both native and hosted operation. KVM is unique in that it's a small modification of the Linux kernel that transforms it (through the loading of a kernel module) into a fully featured hypervisor. KVM supports para-virtualization through the use of virtio, a Linux standard for para-virtaulized drivers in the guest.
KVM was also the first hypervisor to be fully integrated into the mainline kernel. KVM is developed by Red Hat, and is being adopted in a few key installations such as IBM's development and test cloud.
Note that in either full or para-virtualization, each of the solutions takes advantage of hardware-assisted virtualization. Newer processors from AMD and Intel include instructions that optimize hypervisor to guest VM scheduling as well as hardware assists for enhancing virtualization of I/O.
Operating system virtualization is another important technique that provides a higher level abstraction than what we've explored so far. As the name implies, the abstraction is the operating system itself, instead of the platform.
In this way, the operating system provides a set of user-spaces that are isolated from one another, but offering the abstraction necessary such that applications believe that they are part of the singular user-space on the host (see Figure 4). This type of virtualization is popular in virtual hosting environments where multiple independent users can transparently share an operating system.
Figure 4: Operating System Virtualization for Isolating User-Spaces.
OS virtualization relies on a Linux kernel that is capable of creating and isolating user-spaces (sometimes called containers or virtual private servers). The key benefit to OS virtualization is that there's practically no overhead since users simply share the OS and host and no other abstractions are necessary (such as a virtual machine).
The downside is that OS virtualization lacks the flexibility of the solutions that we've already covered. Instead of being able to run an arbitrary operating system, guests share the host operating system and must conform to its version. But even with this restriction, OS virtualization is quite common and useful.
Linux includes a number of OS virtualization solutions that are highly featured and configurable. Solutions like OpenVZ, Linux-VServer, and FreeVPS are three of the most popular. All offer the ability to configure quotas for CPU, memory, network, I/O and storage. OpenVZ even allows migrating live VPSes between host.
In this article, we've explored emulation, platform virtualization and then operating system virtualization. While those three methods are some of the most used techniques, there are a number of other styles of virtualization that can meet your needs. Let's now look at a few other styles that don't exactly fit into the prior virtualization categories.
CoLinux, or cooperative Linux, is a form of virtualization that utilizes what's called the cooperative virtual machine. In this context, a Linux guest runs on top of a Microsoft Windows OS, and both cooperatively share the underlying hardware resources (see the left side of Figure 5). CoLinux requires that the guest operating system (CoLinux itself) be modified to be aware that it is running on another operating system.
In this way, it's para-virtualized, but in a very specific model that assumes that Windows is the host operating system (and that only one CoLinux instance exists for a given host). Because of these restrictions, CoLinux is defined as a different technique.
Figure 5: Special Para-virtualized Schemes for Windows and Linux.
User-Mode Linux, or UML, has some similarities to CoLinux but is much flexible in its approach. UML permits the execution of Linux guests on top of a normal Linux host system (see the left side of Figure 5). The UML guest is compiled specifically to run as a guest (para-virtualized), doing this specifically to achieve better performance. One interesting aspect of UML is that it's possible to next the UMLs, so that a UML guest kernel running on a Linux host can host a higher-level UML guest.
Wine and Cygwin are another set of interesting virtualization solutions, but move even higher up the stack (see Figure 6). Wine, a recursive acronym for "Wine is Not an Emulator," is a way to run Windows applications on a Linux host. Wine doesn't represent a full emulation layer for Windows applications, but instead a layer of DLLs (Dynamic Linked Libraries) that represent the Windows APIs. This gives the Windows application that ability to make what appear to be Windows calls on a Linux host.
Figure 6: Abstraction Layers for Wine and Cygwin.
The reverse solution (to run Linux applications on Windows) is also available through a package called Cygwin. Cygwin, now developed by Red Hat, is a pseudo Unix environment that permits developing Unix applications on Windows (meaning access to POSIX, and other Unix-like facilities).
Each of these approaches (Wine and Cygwin) require that the application be rebuilt on the target environment, making them less like virtualization and more similar to host-aware emulation.
Linux has seen increasing growth in the virtualization space, not just with the development of a core set of hypervisors (based on the Linux OS), but also tools and other parts of the ecosystem (para-virtualized driver architecture), management applications, and more. Virtual Linux provides solutions over the entire spectrum of virtualization techniques (as illustrated by this article) and will continue broaden and deepen it's virtualization arsenal.
Virtual machine: an entity that sits on the hypervisor and encapsulates the OS, applications, and metadata.
IBM M44: virtual Linux was first used on this mainframe computer.
x86 platform: a popular processor often used in virtual Linux solutions.
CoLinux, or cooperative Linux, is a form of virtualization that makes use of whats referred to as a cooperative virtual machine; a Linux guest runs on Windows, with both sharing hardware resources.
User-Mode Linux, or UML, enables the execution of Linux guests on top of a Linux host system.
About the Author
M. Tim Jones is a senior Architect with Emulex Corporation in Longmont, CO. His background ranges from the development of software for geosynchronous satellites to the architecture and development of storage and virtualization solutions.