Total Pageviews

Search: This Blog, Linked From Here, The Web, My fav sites, My Blogroll

Translate

20 September 2009

UML Virtualization Technology By Jeff Dike (his creator)


Intro


What Is UML?
User Mode Linux (UML) is a virtual Linux machine that runs on Linux. In other words, UML is a port of Linux to Linux.

Linux has been ported to many different processors, including the ubiquitous x86, Sun's SPARC, IBM and Motorola's PowerPC, DEC's (then Compaq's and HP's) Alpha, and a variety of others. UML is a port of Linux in exactly the same sense as these. The difference is that it is a port to the software interface defined by Linux rather than the hardware interface defined by the processor and the rest of the physical computer.

UML has manifold uses for system administrators, users, and developers. UML virtual machines are useful:
  • for test environments that can be set up quickly and thrown away when no longer needed
  • production environments that efficiently use the available hardware
  • development setups that can make it much more convenient to test software
  • set up honeypots
  • for teaching and research
  • ... a surprising number of other things.


Comparison with Other Virtualization Technologies
UML differs from other virtualization technologies in being more of a virtual operating system (OS) rather than a virtual machine. In spite of this, I will stick to the common terminology and call UML a virtual machine technology rather than a virtual OS, which would be somewhat more accurate.

Technologies such as VMWare really are virtual machines. They emulate a physical platform, from the CPU to the peripherals, well enough that any OS that runs on the physical platform also runs on the emulated platform provided by VMWare. This has the advantage that it is fairly OS-agnostic in principle, any OS that runs on the platform can boot under VMWare. In contrast, UML can be only a Linux guest. On the other hand, being a virtual OS rather than a virtual machine allows UML to interact more fully with the host OS, which has advantages we will see later.

Other virtualization technologies such as Xen, BSD jail, Solaris zones, and chroot are integrated into the host OS, as opposed to UML, which runs in a process. This gives UML the advantage of being independent from the host OS version, at the cost of some performance. However, a lot (maybe all) of this performance can be regained without losing the flexibility and manageability that UML gains from being in userspace.

As we will see later, the benefits of virtualization accrue largely from the degree of isolation between users and processes inside the virtual machine or jail and those outside it. Most of these technologies (excluding Xen and VMWare) provide only partial virtualization and, thus, partial isolation.

The least complete virtualization is provided by chroot, which only jails processes into a directory. In all other respects, the processes are unconfined. Even then, on Linux, chroot can't confine a process with root privileges, since its design allows superuser processes to escape.

BSD jail and vserver (a Linux-based project with roughly the same properties) provide stronger confinement. They confine processes to a subset of the filesystem and don't allow them to see processes outside the jail. A jail is also restricted to using a single IP address, and it can't manipulate its firewall rules. Jailed processes are not restricted in their use of CPU time or I/O. The jails on a system are implemented within the system's kernel and therefore share the kernel, along with the bugs and security holes it contains. The inability to change firewall rules is a consequence of incomplete virtualization, as is the requirement to share the kernel with the host.

Solaris zones are much closer to full-blown virtual machines and complete isolation. Processes within a zone can't see outside files or processes, as is the case with a jail. Zones have their own logical devices, with some restrictions on their access to the network. For example, raw access to packets isn't allowed. A zone can be assigned a certain number of shares within the global fair share scheduler, limiting the share of CPU that the processes within a zone can consume. We will see this concept later in the form of virtual processors in a multiprocessor virtual machine. Zones, like the other technologies described so far, are implemented within the kernel and share the kernel version and configuration with each other and the host.

Finally, technologies such as VMWare, Xen, and UML implement full virtualization and isolation. They all have fully virtualized devices with no restrictions on how they may be used. They also confine their processes with respect to CPU consumption by virtue of having a certain number of virtual processors they may use. They also all run separate instances of the OS, which may be different versions (and even a completely different OS in the case of VMWare) than the host.



Why Virtual Machines?
  • A UML instance is a full-fledged Linux machine running on the host Linux. It runs all the software and services that any other Linux machine does. The difference is that UML instances can be conjured up on demand and then thrown away when not needed.
  • In addition to the flexibility of being able to create and destroy virtual machines within seconds, the instances themselves can be dynamically reconfigured. Virtual peripherals, processors, and memory can be added and removed arbitrarily to and from a running UML instance. There are also much looser limits on hardware configurations for UML instances than for physical machines. In particular, they are not limited to the hardware they are running on. A UML instance may have more memory, more processors, and more network interfaces, disks, and other devices than its host, or even any possible host. This makes it possible to test software for hardware you don't own, but have to support, or to configure software for a network before the network is available.
Here, I will describe the many uses of UML and provide step-by-step instructions for using it. As the original author and current maintainer of UML, I have seen UML mature from its decidedly cheesy beginnings to its current state where it can do basically everything that any other Linux machine can do (see Table 1.1).



A Bit of History

I started working on UML in earnest in February 1999 after having the idea that porting Linux to itself might be practical. I tossed the idea around in the back of my head for a few months in late 1998 and early 1999. I was thinking about what facilities it would need from the host and whether the system call interface provided by Linux was rich enough to provide those facilities. Ultimately, I decided it probably was, and in the cases where I wasn't sure, I could think of workarounds.

So, around February, I pulled a copy of the 2.0.32 kernel tree off of a Linux CD (probably a Red Hat source CD) because it was too painful to try to download it through my dialup. Within the resulting kernel tree, I created the directories my new port was going to need without putting any files in them. This is the absolute minimum amount of infrastructure you need for a new port. With the directories present, the kernel build process can descend into them and try to build what's there.

Needless to say, with nothing in those directories, the build didn't even start to work. I needed to add the necessary build infrastructure, such as Makefiles. So, I added the minimal set of things needed to get the kernel build to continue and looked at what failed next. Missing were a number of header files used by the generic (hardware-independent) portions of the kernel that the port needs to provide. I created them as empty files, so that the #include preprocessor directives would at least succeed, and proceeded onward.

At this point, the kernel build started complaining about missing macros, variables, and functions the things that should have been present in my empty header files and non existent C source files. This told me what I needed to think about implementing. I did so in the same way as before: For the most part, I implemented the functions as stubs that didn't do anything except print an error message. I also started adding real headers, mostly by copying the x86 headers into my include directory and removing the things that had no chance of compiling.

After defining many of these useless procedures, I got the UML build to "succeed." It succeeded in the sense that it produced a program I could run. However, running it caused immediate failures due to the large number of procedures I defined that didn't do what they were supposed to they did nothing at all except print errors. The utility of these errors is that they told me in what order I had to implement these things for real.

So, for the most part, I plodded along, implementing whatever function printed its name first, making small increments of progress through the boot process with each addition. In some cases, I needed to implement a subsystem, resulting in a related set of functions.

Implementation continued in this vein for a few months, interrupted by about a month of real, paying work. In early June, I got UML to boot a small filesystem up to a login prompt, at which point I could log in and run commands. This may sound impressive, but UML was still bug-ridden and full of design mistakes. These would be rooted out later, but at the time, UML was not much more than a proof of concept.

Because of design decisions made earlier, such fundamental things as shared libraries and the ability to log in on the main console didn't work. I worked around the first problem by compiling a minimal set of tools statically, so they didn't need shared libraries. This minimal set of tools was what I populated my first UML filesystem with. At the time of my announcement, I made this filesystem available for download since it was the only way anyone else was going to get UML to boot.

Because of another design decision, UML, in effect, put itself in the background, making it impossible for it to accept input from the terminal. This became a problem when you tried to log in. I worked around this by writing what amounted to a serial line driver, allowing me to attach to a virtual serial line on which I could log in.

These are two of the most glaring examples of what didn't work at that point. The full list was much longer and included other things such as signal delivery and process preemption. They didn't prevent UML from working convincingly, even though they were fairly fundamental problems, and they would get fixed later.

At the time, Linus was just starting the 2.3 development kernel series. My first "UML-ized" kernel was 2.0.32, which, even at the time, was fairly old. So, I bit the bullet and downloaded a "modern" kernel, which was 2.3.5 or so. This started the process, which continues to this day, of keeping in close touch with the current development kernels (and as of 2.4.0, the stable ones as well).

Development continued, with bugs being fixed, design mistakes rectified (and large pieces of code rewritten from scratch), and drivers and filesystems added. UML spent a longer than usual amount of time being developed out of pool, that is, not integrated into the mainline Linus' kernel tree. In part, this was due to laziness. I was comfortable with the development methodology I had fallen into and didn't see much point in changing it.

However, pressure mounted from various sources to get UML into the main kernel tree. Many people wanted to be able to build UML from the kernel tree they downloaded from or got with their distribution. Others, wanting the best for the UML project, saw inclusion in Linus' kernel as being a way of getting some public recognition or as a stamp of approval from Linus, thus attracting more users to UML. More pragmatically, some people, who were largely developers, noted that inclusion in the official kernel would cause updates and bug fixes to happen in UML "automatically." This would happen as someone made a pass over the kernel sources, for example, to change an interface or fix a family of bugs, and would cover UML as part of that pass. This would save me the effort of looking through the patch representing a new kernel release, finding those changes, figuring out the equivalent changes needed in UML, and making them. This had become my habit over the roughly four years of UML development before it was merged by Linus. It had become a routine part of UML development, so I didn't begrudge the time it took, but there is no denying that it did take time that would have been better spent on other things.

So, roughly in the spring of 2002, I started sending updated UML patches to Linus, requesting that they be merged. These were ignored for some months, and I was starting to feel a bit discouraged, when out of the blue, he merged my 2.5.34 patch on September 12, 2002. I had sent the patch earlier to Linus as well as the kernel mailing list and one of my own UML lists, as usual, and had not thought about it further. That day, I was idling on an Internet Relay Chat (IRC) channel where a good number of the kernel developers hang around and talk. Suddenly, Arnaldo Carvalho de Melo (a kernel contributor from Brazil and the CTO of Conectiva, the largest Linux distribution in South America) noticed that Linus had merged my patch into his tree.

The response to this from the other kernel hackers, and a little later, from the UML community and wider Linux community, was gratifying positive. A surprisingly (to me) large number of people were genuinely happy that UML had been merged, and, in doing so, got the recognition they thought it deserved.

At this writing(April 2006), it is three years later, and UML is still under very active development. There have been ups and downs. Some months after UML was merged, I started finding it hard to get Linus to accept updated patches. After a number of ignored patches, I started maintaining UML out of tree again, with the effect that the in-tree version of UML started to bit-rot. It stopped compiling because no one was keeping it up to date with changes to internal kernel interfaces, and of course bugs stopped being fixed because my fixes weren't being merged by Linus.

Late in 2004, an energetic young Italian hacker named Paolo Giarrusso got Andrew Morton, Linus' second-in-command, to include UML in his tree. The so-called "-mm" tree is a sort of purgatory for kernel patches. Andrew merges patches that may or may not be suitable for Linus' kernel in order to give them some wider exposure and see if they are suitable. Andrew took patches representing the current UML at the time from Paolo, and I followed that up with some more patches. Presently, Andrew forwarded those patches, along with many others, to Linus, who included them in his tree. All of a sudden, UML was up to date in the official kernel tree, and I had a reliable conduit for UML updates.

I fed a steady stream of patches through this conduit, and by the time of the 2.6.9 release, you could build a working UML from the official tree, and it was reasonably up to date. Throughout this period, I had been working on UML on a volunteer basis. I took enough contracting work to keep the bills paid and the cats fed. Primarily, this was spending a day a week at the Institute for Security Technology Studies at Dartmouth College, in northern New Hampshire, about an hour from my house. This changed around May and June of 2004, when, nearly simultaneously, I got job offers from Red Hat and Intel. Both were very generous, offering to have me spend my time on UML, with no requirements to move. I ultimately accepted Intel's offer and have been an Intel employee in the Linux OS group since.

Coincidentally, the job offers came on the fifth anniversary of UML's first public announcement. So, in five years, UML went from nothing to a fully supported part of the official Linux kernel.



What Is UML Used For?
During the five years since UML began, I have seen steady growth in the UML user base and in the number and variety of applications and uses for UML. My users have been nothing if not inventive, and I have seen uses for UML that I would never have thought of.

Server Consolidation
Naturally, the most common applications of UML are the obvious ones. Virtualization has become a hot area of the computer industry, and UML is being used for the same things as other virtualization technologies. Server consolidation (where multiple servers are consolidated using platform virtualization) is a major one, both internally within organizations and externally between them.
  • Internal consolidation usually takes the form of moving several physical servers into the same number of virtual machines running on a single physical host.
  • External consolidation is usually an ISP or hosting company offering to rent UML instances to the public just as they rent physical servers. Here, multiple organizations end up sharing physical hardware with each other.
The main attraction is cost savings. Computer hardware has become so powerful and so cheap that the old model of one service, or maybe two, per machine now results in hardware that is almost totally idle. There is no technical reason that many services, and their data and configurations, couldn't be copied onto a single server. However, it is easier in many cases to copy each entire server into a virtual machine and run them all unchanged on a single host. It is less risky since the configuration of each is the same as on the physical server, so moving it poses no chance of upsetting an already-debugged environment.

In other cases, virtual servers may offer organizational or political benefits. Different services may be run by different organizations, and putting them on a single physical server would require giving the root password to each organization. The owner of the hardware would naturally tend to feel queasy about this, as would any given organization with respect to the others. A virtual server neatly solves this by giving each service its own virtual machine with its own root password. Having root privileges in a virtual machine in no way requires root privileges on the host. Thus, the services are isolated from the physical host, as well as from each other. If one of them gets messed up, it won't affect the host or the other services.

Moving from
development to production, UML virtual machines are commonly used to set up and test environments before they go live in production. Any type of environment from a single service running on a single machine to a network running many services can be tested on a single physical host. In the latter case, you would set up a virtual network of UMLs on the host, run the appropriate services on the virtual hosts, and test the network to see that it behaves properly.
In a complex situation like this, UML shines because of the ease of setting up and shutting down a virtual network. This is simply a matter of running a set of commands, which can be scripted. Doing this without using virtual machines would require setting up a network of physical machines, which is vastly more expensive in terms of time, effort, space, and hardware. You would have to find the hardware, from systems to network cables, find some space to put it in, hook it all together, install and configure software, and test it all. In addition to the extra time and other resources this takes, compared to a virtual test environment, none of this can be automated.

In contrast, with a UML testbed, this can be completely automated. It is possible, and fairly easy, to full automate the configuration and booting of a virtual network and the testing of services running on that network. With some work, this can be reduced to a single script that can be run with one command. In addition, you can make changes to the network configuration by changing the scripts that set it up, rather than rewiring and rearranging hardware. Different people can also work independently on different areas of the environment by booting virtual networks on their own workstations. Doing this in a physical environment would require separate physical testbeds for each person working on the project.

Implementing this sort of testbed using UML systems instead of physical ones results in the near-elimination of hardware requirements, much greater parallelism of development and testing, and greatly reduced turnaround time on configuration changes. This can reduce the time needed for testing and improve the quality of the subsequent deployment by increasing the amount and variety of testing that's possible in a virtual environment.

A number of open source projects, and certainly a much larger number of private projects, use UML in this way. Here are a couple that I am aware of:

  • Openswan: the open source IPSec project, uses a UML network for nightly regression testing and its kernel development.
  • BusyBox: a small-footprint set of Linux utilities, uses UML for its testing.

Education
Consider moving the sort of UML setup I just described from a corporate environment to an educational one. Instead of having a temporary virtual staging environment, you would have a permanent virtual environment in which students will wreak havoc and, in doing so, hopefully learn something.

Now, the point of setting up a complicated network with interrelated services running on it is simply to get it working in the virtual environment, rather than to replicate it onto a physical network once it's debugged. Students will be assigned to make things work, and once they do (or don't), the whole thing will be torn down and replaced with the next assignment.

The educational uses of UML are legion, including courses that involve any sort of system administration and many that involve programming. System administration requires the students to have root privileges on the machines they are learning on. Doing this with physical machines on a physical network is problematic, to say the least.

As root, a student can completely destroy the system software (and possibly damage the hardware). With the system on a physical network, a student with privileges can make the network unusable by, wittingly or unwittingly, spoofing IP addresses, setting up rogue DNS or DHCP servers, or poisoning ARP (Address Resolution Protocol) [1]caches on other machines on the network.
[1] ARP is used on an Ethernet network to convert IP addresses to Ethernet addresses. Each machine on an Ethernet network advertises what IP addresses it owns, and this information is stored by the other machines on the network in their ARP caches. A malicious system could advertise that it owns an IP address that really belongs to a different machine, in effect, hijacking the address. For example, hijacking the address of the local name server would result in name server requests being sent to the hijacking machine rather than the legitimate name server. Nearly all Internet operations begin with a name lookup, so hijacking the address of the name server gives an enormous amount of control of the local network to the attacker.
These problems all have solutions in a physical environment. Machines can be completely reimaged between boots to undo whatever damage was done to the system software. The physical network can be isolated from any other networks on which people are trying to do real work. However, all this takes planning, setup, time, and resources that just aren't needed when using a UML environment.

The boot disk of a UML instance is simply a file in the host's filesystem. Instead of reimaging the disk of a physical machine between boots, the old UML root filesystem file can be deleted and replaced with a copy of the original. As will be described in later sections, UML has a technology called COW (Copy-On-Write) files, which allow changes to a filesystem to be stored in a host file separate from the filesystem itself. Using this, undoing changes to a filesystem is simply a matter of deleting the file that contains the changes. Thus, reimaging a UML system takes a fraction of a second, rather than the minutes that reimaging a disk can take.

Looking at the network, a virtual network of UMLs is by default isolated from everything else. It takes effort, and privileges on the host, to allow a virtual network to communicate with a physical one. In addition, an isolated physical network is likely to have a group of students on it, so that one sufficiently malign or incompetent student could prevent any of the others from getting anything done. With a UML instance, it is feasible (and the simplest option) to give each student a private network. Then, an incompetent student can't mess up anyone else's network.

UML is also commonly used for learning kernel-level programming. For novice to intermediate kernel programming students, UML is a perfect environment in which to learn. It provides an authentic kernel to modify, with the development and debugging tools that should already be familiar. In addition, the hardware underneath this kernel is virtualized and thus better behaved than physical hardware. Failures will be caused by buggy software, not by misbehaving devices. So, students can concentrate on debugging the code rather than diagnosing broken or flaky hardware.

Obviously, dealing with broken, flaky, slightly out-of-spec, not-quite-standards-compliant devices are an essential part of an expert kernel developer's repertoire. To reach that exalted status, it is necessary to do development on physical machines. But learning within a UML environment can take you most of the way there.

Over the years, I have heard of education institutions teaching many sort of Linux administration courses using UML. Some commercial companies even offer system administration courses over the Internet using UML. Each student is assigned a personal UML, which is accessible over the Internet, and uses it to complete the coursework.


Development
Moving from system administration to development, I've seen a number of programming courses that use UML instances. Kernel-level programming is the most obvious place for UMLs. A system-level programming course is similar to a system administration course in that each student should have a dedicated machine. Anyone learning kernel programming is probably going to crash the machine, so you can't really teach such a course on a shared machine.

UML instances have all the advantages already described, plus a couple of bonuses. The biggest extra is that, as a normal process running on the host, a UML instance can be debugged with all the tools that someone learning system development is presumably already familiar with. It can be run under the control of gdb, where the student can set breakpoints, step through code, examine data, and do everything else you can do with gdb. The rest of the Linux development environment works as well with UML as with anything else. This includes gprof and gcov for profiling and test coverage and strace and ltrace for system call and library tracing.

Another bonus is that, for tracking down tricky timing bugs, the debugging tool of last resort, the print statement, can be used to dump data out to the host without affecting the timing of events within the UML kernel. With a physical machine, this ranges from extremely hard to impossible. Anything you do to store information for later retrieval can, and probably will, change the timing enough to obscure the bug you are chasing. With a UML instance, time is virtual, and it stops whenever the virtual machine isn't in the host's userspace, as it is when it enters the host kernel to log data to a file.

A popular use for UML is development for hardware that does not yet exist. Usually, this is for a piece of embedded hardware an appliance of some sort that runs Linux but doesn't expose it. Developing the software inside UML allows the software and hardware development to run in parallel. Until the actual devices are available, the software can be developed in a UML instance that is emulating the hardware. Examples of this are hard to come by because embedded developers are notoriously close-lipped, but I know of a major networking equipment manufacturer that is doing development with UML. The device will consist of several systems hooked together with an internal network. This is being simulated by a script that runs a set of UML instances (one per system in the device) with a virtual network running between them and a virtual network to the outside. The software is controlling the instances in exactly the same that it will control the systems within the final device.

Going outside the embedded device market, UML is used to simulate large systems. A UML instance can have a very large amount of memory, lots of processors, and lots of devices. It can have more of all these things than the host can, making it an ideal way to simulate a larger system than you can buy. In addition to simulating large systems, UML can also simulate clusters. A couple of open source clustering systems and a larger number of cluster components, such as filesystems and heartbeats, have been developed using UML and are distributed in a form that will run within a set of UMLs.


Disaster Recovery Practice
A fourth area of UML use, which is sort of a combination of the previous two, is disaster recovery practice. It's a combination in the sense that this would normally be done in a corporate environment, but the UML virtual machines are used for training.

The idea is that you make a virtual copy of a service or set of services, mess it up somehow, and figure out how to fix it. There will likely be requirements beyond simply fixing what is broken. You may require that the still-working parts of the service not be shut down or that the recovery be done in the least amount of time or with the smallest number of operations.

The benefits of this are similar to those mentioned earlier. Virtual environments are far more convenient to set up, so these sorts of exercises become far easier when virtual machines are available. In many cases, they simply become possible since hardware can't be dedicated to disaster recovery practice. The system administration staff can practice separately at their desks, and, given a well-chosen set of exercises, they can be well prepared when disaster strikes.


The Future
Among the plans is a project to port UML into the host kernel so that it runs inside the kernel rather than in a process. With some restructuring of UML, breaking it up into independent subsystems that directly use the resources provided by the host kernel, this in-kernel UML can be used for a variety of resource limitation applications such as resource control and jailing.

This will provide highly customizable jailing, where a jail is constructed by combining the appropriate subsystems into a single package. Processes in such a jail will be confined with respect to the resources controlled by the jail, and otherwise unconfined. This structure of layering subsystems on top of each other has some other advantages as well. It allows them to be nested, so that a user confined within a jail could construct a subjail and put processes inside it. It also allows the nested subsystems to use different algorithms than the host subsystems. So, a workload with unusual scheduling or memory needs could be run inside a jail with algorithms suitable for it.

However, the project I'm most excited about is using UML as a library, allowing other applications to link against it and thereby gain a captive virtual machine. This would have a great number of uses:

  • Managing an application or service from the inside, by logging in to the embedded UML
  • Running scripts inside the embedded UML to control, monitor, and extend the application
  • Using clustering technology to link multiple embedded UMLs into a cluster and use scripts running on this cluster to integrate the applications in ways that are currently not possible

A Quick Look at UML


I will concentrate on the relationship between the UML and the host. For many people, encountering a virtual machine for the first time can be confusing because it may not be clear where the host ends and the virtual machine starts. For example, the virtual machine obviously is part of the host since it can't exist without the host. However, it is totally separate from the host in other ways. You can be root inside the UML and have no privileges [1]whatsoever on the host.
[1] In order to run a process, you obviously need some level of privilege on the system. However, a UML host can be set up such that the user that owns the UML processes on the host can do nothing but run the UML process.
When UML is run, it is provided some host resources to use as its own. The root user within UML has absolute control over those, but no control, not even access, to anything else on the host. It's this extremely sharp distinction between what the UML has access to and what it doesn't that makes UML useful for a large number of applications.

A second common source of confusion is the duality of UML. It is both a Linux kernel and a Linux process. It is useful, and instructive, to look at UML from both perspectives. However, to many people, a kernel and a process are two completely different things, and there can be no overlap between them. So, we will look at a UML from both inside and outside, on the host, in order to compare the two views to each other. We will see different views of the same things. They will look different but will both be correct in their own ways. Hopefully, by the end of the section, it will be clear how something can be both a Linux kernel and a Linux process.

Figure 2.1 shows the relationship among a UML instance, the host kernel, and UML processes. To the host kernel, the UML instance is a normal process. To the UML processes, the UML instance is a kernel. Processes interact with the kernel by making system calls, which are like procedure calls except that they request the kernel do something on their behalf.

Like all other processes on the host, UML makes system calls to the host kernel in order to do its work. Unlike the other host processes, UML has its own system call interface for its processes to use. This is the source of the duality of UML. It makes system calls to the host, which makes it a process, and it implements system calls for its own processes, making it a kernel.


Building UML kernel from source


This is kept up-to-date in mainline, so get the kernel of your choice from kernel.org and build UML from it. The procedure step by step is also reported here:
  1. Get the source: You need to start by getting a kernel tree (generally, the more recent, the better)
    harrykar@harrysas:~$ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.tar.bz2
    --2009-09-21 23:37:11-- http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.tar.bz2
    Resolving www.kernel.org... 204.152.191.37, 149.20.20.133
    Connecting to www.kernel.org|204.152.191.37|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 61494822 (59M) [application/x-bzip2]
    Saving to: `linux-2.6.31.tar.bz2'

    100%[=============================================================================>] 61,494,822 175K/s in 8m 27s

    2009-09-21 23:45:39 (118 KB/s) - `linux-2.6.31.tar.bz2' saved [61494822/61494822]

  2. Now that linux-2.6.31.tar.bz2 is in your home directory unpack the tree, which will end up in a linux-2.6.31 directory (30823 items, totalling 333.5 MB)
    # tar -xvjf linux-2.6.31.tar.bz2 ; cd linux-2.6.31
  3. Configuration: Start with the UML default configuration(defconfig), which will compile and boot. If you need to make changes, then do that later using menuconfig or gui's xconfig gconfig.
    If you don't start with a defconfig, then the kernel build will be that of the host (it will find a config file in /boot), which will be very wrong for UML and will produce a UML that lacks vital drivers and won't boot:
    harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ make defconfig ARCH=um
    HOSTCC scripts/basic/fixdep
    HOSTCC scripts/basic/docproc
    HOSTCC scripts/basic/hash
    HOSTCC scripts/kconfig/conf.o
    scripts/kconfig/conf.c: In function ‘conf_askvalue’:
    scripts/kconfig/conf.c:105: warning: ignoring return value of ‘fgets’, declared with attribute warn_unused_result
    scripts/kconfig/conf.c: In function ‘conf_choice’:
    scripts/kconfig/conf.c:307: warning: ignoring return value of ‘fgets’, declared with attribute warn_unused_result
    HOSTCC scripts/kconfig/kxgettext.o
    SHIPPED scripts/kconfig/zconf.tab.c
    SHIPPED scripts/kconfig/lex.zconf.c
    SHIPPED scripts/kconfig/zconf.hash.c
    HOSTCC scripts/kconfig/zconf.tab.o
    HOSTLD scripts/kconfig/conf
    scripts/kconfig/conf -d arch/um/Kconfig.x86
    #
    # configuration written to .config
    #
    Note - it is vitally important to put "ARCH=um" on every make command while building UML(or to "export ARCH=um" to put ARCH in your environment). This causes the kernel build to build UML, which is a separate Linux architecture. Not doing so will cause the kernel build to build or configure a native kernel.
    If you should forget, clean the pool like this to get rid of all traces of whatever building you did, and start over.
    harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ make mrproper
    CLEAN /home/harrykar/works/UML_virt/linux-2.6.31
    CLEAN init
    CLEAN kernel
    CLEAN lib
    CLEAN .tmp_versions
    CLEAN vmlinux System.map .tmp_kallsyms1.o .tmp_kallsyms1.S .tmp_kallsyms2.o .tmp_kallsyms2.S .tmp_kallsyms3.o .tmp_kallsyms3.S .tmp_vmlinux1 .tmp_vmlinux2 .tmp_vmlinux3 .tmp_System.map
    CLEAN scripts/basic
    CLEAN scripts/kconfig
    CLEAN scripts/mod
    CLEAN scripts
    CLEAN include/config
    CLEAN .config .config.old include/asm .version include/linux/autoconf.h include/linux/version.h include/linux/utsrelease.h include/linux/bounds.h include/asm/asm-offsets.h include/asm-um/asm-offsets.h Module.symvers
    harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ make mrproper ARCH=um
    CLEAN arch/um/kernel
    CLEAN linux arch/um/include/shared/user_constants.h arch/um/include/shared/kern_constants.h
    now we can run menuconfig to set options (see below) in order to build a fully functioning kernel:
    harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ make menuconfig ARCH=um
    HOSTCC scripts/kconfig/lxdialog/checklist.o
    HOSTCC scripts/kconfig/lxdialog/inputbox.o
    HOSTCC scripts/kconfig/lxdialog/menubox.o
    HOSTCC scripts/kconfig/lxdialog/textbox.o
    HOSTCC scripts/kconfig/lxdialog/util.o
    HOSTCC scripts/kconfig/lxdialog/yesno.o
    HOSTCC scripts/kconfig/mconf.o
    HOSTLD scripts/kconfig/mconf
    scripts/kconfig/mconf arch/um/Kconfig.x86
    #
    # configuration written to .config
    #


    *** End of Linux kernel configuration.
    *** Execute 'make' to build the kernel or try 'make help'.
    • Loadable module support
      ->Enable loadable module support - Not Required
      Note:We can't debug these so having this option enabled would be a waste
    • UML Specific Options
      -> Host Processor type features
      -> Generic x86 support - Disable this
      -> Processor family (386)
      -> Choose your proccessor , I chose PIII that coresponded to my laptop.
    • Networking
      -> Amateur Radio - Not Required
      -> IRDA (infrared) Subsystem Support - Not Required
      -> Blue tooth Subsytem Support - Not Required
    • Character devices
      -> stderr console - Enable
      -> virtual serial line - Enable
      -> port channel support - Enable
      -> pty channel support - Enable
      -> tty channel support - Enable
      -> xterm channel support - Enable
    • Block Devices
      -> Virtual block devices - Enable
    • UML Network Devices
      - You may or may not want to turn on the different network options under
      these settings. Consult the uml websitefor more information on these devices.
    • File systems
      - To make my kernel build faster I only enabled the couple of filesystems
      that I knew that I would need (ext3 was one of them)
    • SCSI support
      -> SCSI support - Disable
    • Multi-device support (RAID and LVM)
      -> Multiple devices driver support (RAID and LVM) - Disable
    • Memory Technology Devices (MID)
      -> Memory Technology Device (MID) support - Disable
    • Kernel Hacking
      -> Show timing information on printks - Enable
      -> Kernel debugging - Enable
      -> Compile the Kernel with Debug Info - Enable
  4. Building:
    harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ make ARCH=um
    When this finishes, you will have a UML binary called "linux". It's so large because of the debugging symbols built in to it. Removing those will shrink the UML binary to roughly the size of a native kernel.
  5. Now, you are ready to boot your new UML. Anyway you can obtain readily a precompiled kernel from here

Let's take a look at the UML binary, which is normally called linux:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ ls -l linux
-rwxr-xr-x 2 harrykar harrykar 27393131 2009-09-22 00:17 linux

This is a normal Linux ELF binary, as you can see by running file on it:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ file linux
linux: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked
(uses shared libs), for GNU/Linux 2.6.15, not stripped

It is also a Linux kernel, so it may be instructive to compare it to the kernel running on th host machine:

harrykar@harrysas:~$ ls -l /boot/vmlinuz*
-rw-r--r-- 1 root root 3522336 2009-04-17 05:34 /boot/vmlinuz-2.6.28-11-generic
-rw-r--r-- 1 root root 3511008 2009-07-25 04:48 /boot/vmlinuz-2.6.28-14-generic
-rw-r--r-- 1 root root 3511296 2009-08-18 22:54 /boot/vmlinuz-2.6.28-15-generic

The UML binary is quite a bit larger than the kernel on the host, but it has a full symbol table, as you can see from the output of file above. So, let's strip it and see what that does:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ strip -sv linux
copy from `linux' [elf64-x86-64] to `stXSAuxQ' [elf64-x86-64]

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ ls -l linux
-rwxr-xr-x 2 harrykar harrykar 2667976 2009-09-22 00:48 linux

If it's a bit more than twice as large as the host kernel(not in our example), that's possibly because the configurations are different. I tend to build options into UML, which on the host are modules. Checking this by adding up the sizes of the modules loaded on the host yields this:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ lsmod
Module Size Used by
isofs 43688 0
udf 92712 0
crc_itu_t 10496 1 udf
xt_limit 11140 8
xt_tcpudp 11776 26
iptable_mangle 11520 1
ipt_LOG 14468 8
ipt_MASQUERADE 11520 0
nf_nat 30100 1 ipt_MASQUERADE
xt_DSCP 12032 5
ipt_REJECT 11776 1
nf_conntrack_irc 14648 0
nf_conntrack_ftp 17592 0
nf_conntrack_ipv4 24216 8 nf_nat
nf_defrag_ipv4 10496 1 nf_conntrack_ipv4
xt_state 10624 6
nf_conntrack 84752 6 ipt_MASQUERADE,nf_nat,nf_conntrack_irc,nf_conntrack_ftp,nf_conntrack_ipv4,xt_state
iptable_filter 11392 1
ip_tables 28304 2 iptable_mangle,iptable_filter
x_tables 31624 8 xt_limit,xt_tcpudp,ipt_LOG,ipt_MASQUERADE,xt_DSCP,ipt_REJECT,xt_state,ip_tables
binfmt_misc 18572 1
ppdev 16904 0
bridge 63776 0
stp 11140 1 bridge
bnep 22912 2
input_polldev 12688 0
video 29204 0
output 11648 1 video
lp 19588 0
parport 49584 2 ppdev,lp
snd_hda_intel 557492 5
snd_pcm_oss 52352 0
snd_mixer_oss 24960 1 snd_pcm_oss
snd_pcm 99464 3 snd_hda_intel,snd_pcm_oss
snd_seq_dummy 11524 0
snd_seq_oss 41984 0
snd_seq_midi 15744 0
snd_rawmidi 33920 1 snd_seq_midi
snd_seq_midi_event 16512 2 snd_seq_oss,snd_seq_midi
snd_seq 66272 6 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq_midi_event
snd_timer 34064 2 snd_pcm,snd_seq
snd_seq_device 16276 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_rawmidi,snd_seq
cfi_cmdset_0002 37248 1
snd 78920 18 snd_hda_intel,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_seq_oss,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
jedec_probe 22656 0
soundcore 16800 1 snd
psmouse 64028 0
cfi_probe 12288 0
gen_probe 11904 2 jedec_probe,cfi_probe
cfi_util 15360 2 cfi_cmdset_0002,cfi_probe
snd_page_alloc 18704 2 snd_hda_intel,snd_pcm
serio_raw 14468 0
pcspkr 11136 0
i2c_nforce2 16136 0
ck804xrom 14212 0
mtd 25100 3 cfi_cmdset_0002,ck804xrom
chipreg 11652 3 jedec_probe,cfi_probe,ck804xrom
map_funcs 10368 1 ck804xrom
nvidia 8123768 36
usb_storage 115520 0
forcedeth 68368 0
ohci1394 42164 0
ieee1394 108288 1 ohci1394
floppy 75816 0
fbcon 49792 0
tileblit 11264 1 fbcon
font 17024 1 fbcon
bitblit 14464 1 fbcon
softcursor 10368 1 bitblit

Adding that to the file size of vmlinuz-2.6.31 gives us something close to the size of the UML binary after the symbol table has been stripped off.

What is the point of this comparison? It is to introduce the fact that UML is both a Linux kernel and a Linux process. As a Linux process, it can be run just like any other executable on the system, such as bash or ls.


Booting(uncuccessfully) UML for the First Time

Let's boot UML now:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ ./linux
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking for tmpfs mount on /dev/shm...OK
Checking PROT_EXEC mmap in /dev/shm/...OK
Checking for the skas3 patch in the host:
- /proc/mm...not found: No such file or directory
- PTRACE_FAULTINFO...not found
- PTRACE_LDT...not found
UML running in SKAS0 mode
Adding 6508544 bytes to physical memory to account for exec-shield gap
[ 0.000000] Linux version 2.6.31 (harrykar@harrysas) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #1 Tue Sep 22 00:17:03 CEST 2009
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 9647
[ 0.000000] Kernel command line: root=98:0
[ 0.000000] PID hash table entries: 256 (order: 8, 2048 bytes)
[ 0.000000] Dentry cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.000000] Inode-cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] Memory: 28860k available
[ 0.000000] NR_IRQS:15
[ 0.000000] Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
[ 0.250000] Mount-cache hash table entries: 256
[ 0.250000] Checking that host ptys support output SIGIO...Yes
[ 0.250000] Checking that host ptys support SIGIO on close...No, enabling workaround
[ 0.250000] Using 2.6 host AIO
[ 0.250000] NET: Registered protocol family 16
[ 0.250000] bio: create slab at 0
[ 0.250000] NET: Registered protocol family 2
[ 0.250000] IP route cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.250000] TCP established hash table entries: 2048 (order: 3, 32768 bytes)
[ 0.250000] TCP bind hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.250000] TCP: Hash tables configured (established 2048 bind 2048)
[ 0.250000] TCP reno registered
[ 0.250000] NET: Registered protocol family 1
[ 0.250000] IRQ 9/mconsole: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.250000] mconsole (version 2) initialized on /home/harrykar/.uml/OqH5tu/mconsole
[ 0.250000] Checking host MADV_REMOVE support...OK
[ 0.260000] VFS: Disk quotas dquot_6.5.2
[ 0.260000] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.260000] msgmni has been set to 56
[ 0.260000] io scheduler noop registered
[ 0.260000] io scheduler anticipatory registered (default)
[ 0.260000] io scheduler deadline registered
[ 0.260000] io scheduler cfq registered
[ 0.260000] TCP cubic registered
[ 0.260000] NET: Registered protocol family 17
[ 0.260000] Initialized stdio console driver
[ 0.260000] Console initialized on /dev/tty0
[ 0.260000] console [tty0] enabled
[ 0.260000] Initializing software serial port version 1
[ 0.260000] console [mc-1] enabled
[ 0.260000] Couldn't stat "root_fs" : err = 2
[ 0.260000] Failed to initialize ubd device 0 :Couldn't determine size of device's file
[ 0.260000] VFS: Cannot open root device "98:0" or unknown-block(98,0)
[ 0.260000] Please append a correct "root=" boot option; here are the available partitions:
[ 0.260000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)
[ 0.260000] Call Trace:
[ 0.260000] 6201fd88: [<601875f2>] panic+0xd3/0x174
[ 0.260000] 6201fda8: [<60009149>] printk_all_partitions+0x1c8/0x1da
[ 0.260000] 6201fdf8: [<600582c5>] __free_pages+0x1a/0x23
[ 0.260000] 6201fe18: [<6008ae5d>] sys_mount+0xb6/0xcd
[ 0.260000] 6201fe28: [<60021b21>] set_signals+0x1c/0x2e
[ 0.260000] 6201fe78: [<600019af>] mount_block_root+0x252/0x270
[ 0.260000] 6201fee8: [<60001a1d>] mount_root+0x50/0x54
[ 0.260000] 6201ff08: [<60001b3a>] prepare_namespace+0x119/0x13e
[ 0.260000] 6201ff18: [<6000119e>] kernel_init+0xca/0xd8
[ 0.260000] 6201ff48: [<60020b9d>] run_kernel_thread+0x41/0x4a
[ 0.260000] 6201ff58: [<600010d4>] kernel_init+0x0/0xd8
[ 0.260000] 6201ff98: [<60020b84>] run_kernel_thread+0x28/0x4a
[ 0.260000] 6201ffc8: [<60012e9f>] new_thread_handler+0x72/0x9c
[ 0.260000]
[ 0.260000]
[ 0.260000] Modules linked in:
[ 0.260000] Pid: 1, comm: swapper Not tainted 2.6.31
[ 0.260000] RIP: 0033:[<00007f99e46112a7>]
[ 0.260000] RSP: 00007fffecd70ec8 EFLAGS: 00000202
[ 0.260000] RAX: 0000000000000000 RBX: 0000000000001a2f RCX: ffffffffffffffff
[ 0.260000] RDX: 0000000000000000 RSI: 0000000000000013 RDI: 0000000000001a2f
[ 0.260000] RBP: 00007fffecd70f00 R08: 00007fffecd70e10 R09: 0000000000000000
[ 0.260000] R10: 00007fffecd70c50 R11: 0000000000000202 R12: 0000000000001a2b
[ 0.260000] R13: 00007f99e4d6d698 R14: 00007fffecd70fe8 R15: 00007fffecd711a8
[ 0.260000] Call Trace:
[ 0.260000] 6201fd00: [<6004c4e1>] __module_text_address+0xd/0x5b
[ 0.260000] 6201fd18: [<60015155>] panic_exit+0x2f/0x45
[ 0.260000] 6201fd38: [<60043457>] notifier_call_chain+0x33/0x5b
[ 0.260000] 6201fd78: [<60043499>] atomic_notifier_call_chain+0xf/0x11
[ 0.260000] 6201fd88: [<60187603>] panic+0xe4/0x174
[ 0.260000] 6201fda8: [<60009149>] printk_all_partitions+0x1c8/0x1da
[ 0.260000] 6201fdf8: [<600582c5>] __free_pages+0x1a/0x23
[ 0.260000] 6201fe18: [<6008ae5d>] sys_mount+0xb6/0xcd
[ 0.260000] 6201fe28: [<60021b21>] set_signals+0x1c/0x2e
[ 0.260000] 6201fe78: [<600019af>] mount_block_root+0x252/0x270
[ 0.260000] 6201fee8: [<60001a1d>] mount_root+0x50/0x54
[ 0.260000] 6201ff08: [<60001b3a>] prepare_namespace+0x119/0x13e
[ 0.260000] 6201ff18: [<6000119e>] kernel_init+0xca/0xd8
[ 0.260000] 6201ff48: [<60020b9d>] run_kernel_thread+0x41/0x4a
[ 0.260000] 6201ff58: [<600010d4>] kernel_init+0x0/0xd8
[ 0.260000] 6201ff98: [<60020b84>] run_kernel_thread+0x28/0x4a
[ 0.260000] 6201ffc8: [<60012e9f>] new_thread_handler+0x72/0x9c
[ 0.260000]
remove_umid_dir - remove_files_and_dir failed with err = -39
Segmentation fault

Notice two obvious things about the results, shown
  1. The output resembles the boot output of a normal Linux machine.
  2. The boot was not very successful, as you can see from the panic and stack dump at the end.
It's worth comparing this to the boot output of a Linux system, which is normally available by running dmesg. You'll see a lot of similarities many of the messages, such as the ones from the filesystem and network subsystems, are identical. Much of the rest are totally different, although they should seem similar in purpose. This is largely due to hardware drivers initializing. UML doesn't have the same hardware or drivers as the host, so their bootup messages will be different. If you have access to Linux on several different architectures, such as x86 and x86_64 or ppc, you'll see the same sorts of differences between their boot output. In fact, this is a very apt comparison because UML is a different architecture from the Linux kernel running on the host. Let's look at the output in more detail:

Checking for the skas3 patch in the host:
- /proc/mm...not found: No such file or directory
Checking PROT_EXEC mmap in /dev/shm/...OK

These are checking the environment on the host to see if it can run at all (the executable /dev/shm/ check) and whether the host kernel has capabilities that allow UML to run more efficiently. These particular checks need to be done very early.

Checking for host processor cmov support...Yes
Checking for host processor xmm support...No
Checking that ptrace can change system call numbers...OK

These are checking some more capabilities of the host. The first two are checking processor capabilities, and the last is checking whether the host has a feature that's absolutely needed for UML to run (which all modern hosts do).

mconsole (version 2) initialized on /home/harrykar/.uml/OqH5tu/mconsole
...
Initialized stdio console driver
...
Initializing software serial port version 1

Here, UML is initializing its drivers. A UML boot has much less output of this sort compared with a boot of a physical Linux system. This is because UML uses resources on the host to support its virtual hardware, and there are many fewer types of these resources than there are different types of devices on a physical system. For example, every possible sort of block device within UML can be accessed as a host file, so block devices require a single UML driver. In contrast, the host has multitudes of block drivers, for IDE disks, SCSI disks, SATA disks, and so on. Because of the uniform interface provided by the host, UML requires many fewer drivers in order to access these devices and the data on them.

The first driver is the mconsole (MConsole stands for "Management Console" and is a mechanism for controlling and monitoring a UML instance from the host.) driver, which allows a UML to be controlled and managed from the host. This has no hardware equivalent on most Linux systems. The last two are the console and serial line drivers, which obviously do have hardware equivalents, except that the UML drivers will communicate using virtual devices such as pseudo-terminals rather than physical devices such as a graphics card or serial line.

[    0.260000] Failed to initialize ubd device 0 :Couldn't determine size of device's file
[ 0.260000] VFS: Cannot open root device "98:0" or unknown-block(98,0)
[ 0.260000] Please append a correct "root=" boot option; here are the available partitions:
[ 0.260000] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(98,0)

Here is the panic that killed off this attempted run of UML. The problem is that we didn't provide UML with a root device, so it couldn't mount its root filesystem. This is fatal and causes the panic and the stack trace. You can make a physical Linux machine do exactly the same thing by putting a bogus "root=" option on the kernel command line using LILO or GRUB. UML needs no bootloader like the host needs LILO or GRUB. As it is run from the command line, you can think of the host as being the UML bootloader.

Finally, an important point is that we just panicked a UML kernel, and the only result was that we were dropped back to the shell prompt. The host system itself, and everything else on the system, was totally unaffected by the crash. This demonstrates the basis of many of the advantages of UML over a physical system it can be used in ways that may cause system crashes or other software malfunctions, but the damage is limited to the virtual machine. As we will see later, even this damage can be undone quite easily.

That may have been interesting, but not very useful. Now, we will boot UML successfully and see how it looks inside.


Booting UML Successfully
The problem previously was that we didn't tell UML what its root device was. This is an important special case of a more general property of UML: its hardware is configured on the fly. In contrast to a physical system, whose hardware is fixed, a virtual system can be different every time it is booted. So, it expects to be told, either on the command line or later via the mconsole interface, what hardware it possesses.

Here, we will configure UML on the command line. The first order of business is to give it a proper root device so that it has something it can boot. As I mentioned earlier, UML devices are virtual and constructed from host resources. Specifically, UML's disks are generally (but not always, as we will see later) files in the host's filesystem.For example, here is the filesystem we will boot. One obvious thing here is that the filesystem image is very large. file will tell us a bit more about it:
harrykar@harrysas:~/works/UML_virt/roots$ ls -l Debian-4.0-AMD64-root_fs
-rw-r--r-- 1 harrykar harrykar 1073741824 2009-09-23 11:46 Debian-4.0-AMD64-root_fs

harrykar@harrysas:~/works/UML_virt/roots$ file Debian-4.0-AMD64-root_fs
Debian-4.0-AMD64-root_fs: Linux rev 1.0 ext3 filesystem data, UUID=804365dd-bf4-40a9-a67c-ab4be900efda, volume name "ROOT" (large files)


This tells us that the data in this file is an ext3 filesystem image. In other words, we can loopback- mount it and see that it contains a full filesystem:

harrykar@harrysas:~# mount ~/works/UML_virt/roots/Debian-4.0-AMD64-root_fs ~/mnt -o loop
harrykar@harrysas:~$ ls ~/mnt
bfs boot dev floppy initrd lib mnt root tmp var bin cdrom etc home kernel lost+found proc sbin usr

In fact, when mounting this as its root filesystem, UML will do something very similar to a loopback mount. The UML block driver operates by calling read and write on this file on the host, analogous to a block driver on the host doing reads and writes on a physical disk. The loopback driver on the host is doing exactly the same thing, except from within the host kernel, rather than from a process, where the UML block driver is. So, in order to provide this file to UML as its root device, we need to tell the UML block driver (the ubd or UML Block Device driver) to attach itself to it. This is done with this option:
ubda=~/roots/Debian-4.0-AMD64-root_fs
This is the easiest way to initialize a UML block device, and it simply says that the first UML block device (a in this case) is to be attached to the file ~/roots/Debian-4.0-AMD64-root_fs. Internally, UML tells the kernel initialization code to use the ubda device as its default root device (this can be overridden by specifying a different device with the root= switch, as the panic message suggested). I'm going to add one more option to the command line to make the virtual machine's configuration more explicit:
mem=128M
This makes UML believe it has 128MB of physical memory but does not actually allocate 128MB on the host. Rather, this creates a 128MB sparse file on the host. Being sparse, this file will occupy very little space until data starts being written to it. As the UML instance uses its memory, it will start putting data in the memory backed by this file. As that happens, the host will start allocating memory to hold that data. Since the file is fixed in size, the UML instance is limited to that amount of memory. Its memory consumption will approach this limit asymptotically as it reads file data from its own disks and caches it in its memory.Since the host will be allocating memory for the UML instance dynamically, as needed, the actual consumption will be less than the maximum for a time. This conserves memory, making it possible to run a greater number of not-too-active UML instances than would be possible otherwise.

The host memory consumption will, in this case, be at most 128MB. Even if the UML instance is fully using its memory, the host memory consumption may be less, as it may have swapped out some of the UML memory. The UML instance, like any other process that has been swapped out, will be unaware of this and will use its memory as though it is present in the host's memory. The host kernel is responsible for swapping data back in as needed in order to maintain this illusion.

The UML instance will also swap if its workload exceeds its physical memory. This is entirely independent from the host swapping the UML instance's memory. Each system will swap when it needs more memory, so if the host is short of memory and the UML instance has plenty, the host will swap and the UML instance won't. Conversely, if the UML instance is short of memory and the host isn't, the UML instance will swap and the host won't. The case where both are swapping at the same time is interesting and can lead to pathological performance problems.
Consider the case where both the host and the UML instance are swapping at the same time. They may both choose the same page to swap out. If the host swaps it out first, then when the UML instance swaps it, the host will need to read it back from disk so that the UML instance can write it to its own swap device. This will cause the page to be read and written a total of three times, when only once was desirable. This will increase the I/O load on the host at a time when it is already under stress. Solutions for this sort of situation are under investigation and will be described later.
So, the UML command ends up looking like this:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ ./linux mem=128M ubda=/home/harrykar/works/UML_virt/roots/Debian-4.0-AMD64-root_fs
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking for tmpfs mount on /dev/shm...OK
Checking PROT_EXEC mmap in /dev/shm/...OK
Checking for the skas3 patch in the host:
- /proc/mm...not found: No such file or directory
- PTRACE_FAULTINFO...not found
- PTRACE_LDT...not found
UML running in SKAS0 mode
Adding 19984384 bytes to physical memory to account for exec-shield gap
[ 0.000000] Linux version 2.6.31 (harrykar@harrysas) (gcc version 4.3.3 (Ubuntu 4.3.3-5ubuntu4) ) #1 Tue Sep 22 00:17:03 CEST 2009
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 37132
[ 0.000000] Kernel command line: mem=128M ubda=/home/harrykar/works/UML_virt/roots/Debian-4.0-AMD64-root_fs root=98:0
[ 0.000000] PID hash table entries: 1024 (order: 10, 8192 bytes)
[ 0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.000000] Memory: 123704k available
[ 0.000000] NR_IRQS:15
[ 0.000000] Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
[ 0.250000] Mount-cache hash table entries: 256
[ 0.250000] Checking that host ptys support output SIGIO...Yes
[ 0.250000] Checking that host ptys support SIGIO on close...No, enabling workaround
[ 0.250000] Using 2.6 host AIO
[ 0.250000] NET: Registered protocol family 16
[ 0.250000] bio: create slab at 0
[ 0.250000] NET: Registered protocol family 2
[ 0.250000] IP route cache hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.250000] TCP established hash table entries: 8192 (order: 5, 131072 bytes)
[ 0.250000] TCP bind hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.250000] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.250000] TCP reno registered
[ 0.250000] NET: Registered protocol family 1
[ 0.250000] IRQ 9/mconsole: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.250000] mconsole (version 2) initialized on /home/harrykar/.uml/HP6qAF/mconsole
[ 0.250000] Checking host MADV_REMOVE support...OK
[ 0.250000] VFS: Disk quotas dquot_6.5.2
[ 0.250000] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.250000] msgmni has been set to 241
[ 0.250000] io scheduler noop registered
[ 0.250000] io scheduler anticipatory registered (default)
[ 0.250000] io scheduler deadline registered
[ 0.250000] io scheduler cfq registered
[ 0.250000] TCP cubic registered
[ 0.250000] NET: Registered protocol family 17
[ 0.250000] Initialized stdio console driver
[ 0.250000] Console initialized on /dev/tty0
[ 0.250000] console [tty0] enabled
[ 0.250000] Initializing software serial port version 1
[ 0.250000] console [mc-1] enabled
[ 0.250000] ubda: unknown partition table
[ 0.250000] kjournald starting. Commit interval 5 seconds
[ 0.250000] EXT3-fs: mounted filesystem with writeback data mode.
[ 0.250000] VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
[ 0.260000] IRQ 3/console-write: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.260000] IRQ 2/console: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.260000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.450000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
[ 0.450000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
INIT: [ 0.450000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
version 2.86 booting[ 0.450000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs

[ 0.450000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
Starting the hotplug events dispatcher: udevd.
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...done.
[ 4.080000] line_ioctl: tty0: unknown ioctl: 0x541e
[ 4.080000] line_ioctl: tty0: unknown ioctl: 0x5603
Activating swap...done.
Checking root file system...fsck 1.40-WIP (14-Nov-2006)
ROOT: clean, 17233/131072 files, 143703/262144 blocks
done.
[ 5.060000] EXT3 FS on ubda, internal journal
Setting the system clock..
Cannot access the Hardware Clock via any known method.
Use the --debug option to see the details of our search for an access method.
Cleaning up ifupdown....
Loading kernel modules...FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory
Loading device-mapper support.
Checking file systems...fsck 1.40-WIP (14-Nov-2006)
done.
Setting kernel variables...done.
Mounting local filesystems...done.
Activating swapfile swap...done.
Setting up networking....
Configuring network interfaces...Internet Systems Consortium DHCP Client V3.0.4
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/

modprobe: FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory

modprobe: FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory

SIOCSIFADDR: No such device
modprobe: FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory

eth0: ERROR while getting interface flags: No such device
modprobe: FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory

eth0: ERROR while getting interface flags: No such device
modprobe: FATAL: Could not load /lib/modules/2.6.31/modules.dep: No such file or directory

Bind socket to interface: No such device
Failed to bring up eth0.
done.
Setting console screen modes and fonts.
[ 9.460000] line_ioctl: tty0: unknown ioctl: 0x541e
[ 9.460000] line_ioctl: tty0: unknown ioctl: 0x5603
[ 10.220000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
INIT: [ 10.220000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
Entering runlevel: 2[ 10.220000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs

[ 10.220000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
Starting system log daemon: syslogd.
Starting kernel log daemon: klogd.
* Not starting internet superserver: no services enabled.
Starting OpenBSD Secure Shell server: sshd.
Starting periodic command scheduler: crond.
[ 11.850000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
[ 12.140000] Serial line 0 assigned device '/dev/pts/3'
[ 12.140000] IRQ 6/ssl: IRQF_DISABLED is not guaranteed on shared IRQs
[ 12.140000] IRQ 7/ssl-write: IRQF_DISABLED is not guaranteed on shared IRQs

Debian GNU/Linux 4.0 debian tty0

debian login:


This is much more interesting than the last attempt. We get to see the filesystem booting. Note that it's almost exactly the same as it would be if the same filesystem were booted on the host. The underlying virtual machine shows through in only a couple of places:
  • One is when the root filesystem is checked: The fsck message refers to /dev/ubda rather than /dev/ubd0. Devices can be specified with either numbers or letters. Using letters is generally favored since it is similar to current practice with other drivers, such as naming IDE disks hda, hdb, and so on. It also makes the use of multiple ubd devices within UML less confusing. There's less expectation that ubdb on the command line corresponds to minor number 1 inside the UML instance, as the use of ubd1 does. In fact, ubdb has minor number 16 (to allow for partitions on ubda). The one case where numbers are needed is when you are plugging a large number of disks into a UML instance. There is no letter equivalent of ubd512, so you'd have to use a number to describe this device.
    Checking root file system...fsck 1.40-WIP (14-Nov-2006)
    ROOT: clean, 17233/131072 files, 143703/262144 blocks
    done.
    [ 5.060000] EXT3 FS on ubda, internal journal
    where we see the UML device name, ROOT:...EXT3 FS on ubda, rather than hda1 or sda1 as on a physical machine.
  • The other is when the boot scripts try to synchronize the internal kernel clock with the system's hardware clock:
    Cannot access the Hardware Clock via any known method.
    Use the --debug option to see the details of our search for an access method.
    The UML serial line driver is complaining about an ioctl it doesn't implement, and the hwclock program inside UML is complaining that it tried to execute the iopl instruction and failed. These are both symptoms of hwclock trying different methods of accessing the hardware system clock and failing because the device doesn't exist in UML. The UML kernel does have access to a clock, but it is not one that hwclock will recognize. Rather, it is simply a call to the host's gettimeofday.
    After that, you'll notice that a relatively small number of services are started, but they do include such things as NFS, MySQL, and Apache. All of these run just as they would on a physical machine. This boot process took about 2 seconds on my laptop, demonstrating one of the conveniences of UML the ability to quickly create and destroy virtual machines.


Looking at a UML from the Inside and Outside

Finally, we'll see a login prompt. Sometimes, you can see three of them on screen (anyway you can set it as you wish trough /etc/inittab ). One is in the xterm window in which i ran UML. The other two are in xterm windows run by UML in order to hold the second console and the first serial line, which are configured to have gettys running on them. We'll log in as root and get a shell:

Debian GNU/Linux 4.0 debian tty0

debian login: root
Last login: Thu Sep 24 11:32:32 2009 on tty0
Linux debian 2.6.31 #1 Thu Sep 24 11:59:30 CEST 2009 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
debian:~#


Again, this is identical to what you'd see if you logged in to a physical machine booted on this filesystem. Now it's time to start poking around inside this UML and see what it looks like. First, we'll look at what processes are running:

debian:~#  ps uax
harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ ps aux

What's interesting here is to look at the corresponding processes on the host. Each of the nameless host processes corresponds to an address space inside this UML instance. Except for application and kernel threads, there's a one-to-one correspondence between UML processes and these host processes.

Notice that the properties of the UML processes and the corresponding host processes don't have much in common. All of the host processes are owned by me, whereas the UML processes have various owners, including root. The process IDs are totally different, as are the virtual and resident memory sizes(Vsz Rsz).

This is because the host processes are simply containers for UML address spaces. All of the properties visible inside UML are maintained by UML totally separate from the host. For example, the owner of the host processes will be whoever ran UML. However, many UML processes will be owned by root. These processes have root privileges inside UML, but they have no special privileges on the host. This important fact means that root can do anything inside UML without being able to do anything on the host. A user logged in to a UML as root has no special abilities on the host and, in fact, may not have any abilities at all on the host.

Now, let's look at the memory usage information in /proc/meminfo:

debian:~# cat /proc/meminfo
MemTotal: 123308 kB
MemFree: 105588 kB
Buffers: 1192 kB
Cached: 9896 kB
SwapCached: 0 kB
Active: 7256 kB
Inactive: 6144 kB
Active(anon): 2320 kB
Inactive(anon): 0 kB
Active(file): 4936 kB
Inactive(file): 6144 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 2332 kB
Mapped: 2300 kB
Slab: 3272 kB
SReclaimable: 1468 kB
SUnreclaim: 1804 kB
PageTables: 348 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 61652 kB
Committed_AS: 5344 kB
VmallocTotal: 534085616 kB
VmallocUsed: 0 kB
VmallocChunk: 534085616 kB

The total amount of memory shown, 123308K, is close to the 128MB we specified on the command line. It's not exactly 128MB because some memory allocated during early boot isn't counted in the total. Going back to the host ps output, notice that the linux processes have a virtual size (the VSZ column) of almost exactly 128MB. The difference of 50K is due to a small amount of memory in the UML binary, which isn't counted as part of its physical memory.

Now, let's go back to the host ps output and pick one of the UML processes:

jdike    9938  0.1  3.1 131112 16264 pts/3  R    19:17   0:03 ./linux [ps]

We can look at its open files by looking at the /proc/9938/fd file descriptor directory, which shows an entry like this:

ls -l /proc/9938/fd
lrwx------ 1 jdike jdike 64 Jan 28 12:48 3 -> /tmp/vm_file-AwBs1z (deleted)

This is the host file that holds, and is the same size (128MB in our case) as, the UML "physical" memory. It is created in /tmp and then deleted. The deletion prevents something else on the host from opening it and corrupting it. However, this has the somewhat undesirable side effect that /tmp can become filled with invisible files, which can confuse people who don't know about this aspect of UML's behavior.

To make matters worse, it is recommended for performance reasons to use tmpfs on /tmp. UML performs noticeably better when its memory file is on tmpfs rather than on a disk-based filesystem such as ext3. However, a tmpfs mount is smaller than the disk-based filesystem /tmp would normally be on and thus more likely to run out of space when e.g. running multiple UML instances. This can be handled by making the tmpfs mount large enough to hold the maximum physical memories of all the UML instances on the host or by creating a tmpfs mount for each UML instance that is large enough to hold its physical memory.

Take a look at the root directory:

debian:~# ls /
bin cdrom etc initrd initrd.img.old lib64 media opt root srv tmp var vmlinuz.old
boot dev home initrd.img lib lost+found mnt proc sbin sys usr vmlinuz

This looks strikingly similar to the listing of the loopback mount earlier and somewhat different from the host. Here UML has done the equivalent of a loopback mount of the
~/roots/Debian-4.0-AMD64-root_fs
file on the host.

Note that making the loopback mount on the host required root privileges, while i ran UML as my normal, non-root self and accomplished the same thing. You might think this demonstrates that either the requirement of root privileges on the host is unnecessary or that UML is some sort of security hole for not requiring root privileges to do the same thing. Actually, neither is true because the two operations, the loopback mount on the host and UML mounting its root filesystem, aren't quite the same thing. The loopback mount added a mount point to the host's filesystem, while the mount of / within UML doesn't. The UML mount is completely separate from the host's filesystem, so the ability to do this has no security implications.

However, from a different point of view, some security implications do arise. There is no access from the UML filesystem to the host filesystem. The root user inside the UML can do anything on the UML filesystem, and thus, to the host file that contains it, but can't do anything outside it[6]. So, inside UML, even root is jailed and can't break out.
[6] We will talk about this in greater detail later, but UML is secure against a breakout by the superuser only if it is configured properly. Most important, module support and the ability to write to physical memory must be disabled within the UML instance. The UML instance is owned by some user on the host, and the UML kernel has the same privileges as that user. So, the ability for root to modify kernel memory and inject code into it would allow doing anything on the host that the host user can do. Disallowing this ensures that even the superuser inside UML stays jailed.
This is a general property of UML: a UML is a full-blown Linux machine with its own resources. With respect to those resources, the root user within UML can do anything. But it can do nothing at all to anything on the host that's not explicitly provided to the UML. We've just seen this with disk space and files, and it's also true for networking, memory, and every other type of host resource that can be made accessible within UML.

Next, we can see some of UML's hardware support by looking at the mount table:

debian:~# mount
/dev/ubda on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)

Here we see the ubd device we configured on the command line now mounted as the root filesystem. The other mounts are normal virtual filesystems, udev, procfs and devpts, and a tmpfs mount on /tmp.

df
will show us how much space is available on the virtual disk:

debian:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/ubda 1032088 558524 421136 58% /
tmpfs 61652 0 61652 0% /lib/init/rw
udev 10240 8 10232 1% /dev
tmpfs 61652 0 61652 0% /dev/shm

Compare the total size of /dev/ubda (1032088K) to that of the host file:

harrykar@harrysas:~/works/UML_virt/roots$ ls -l
-rw-r--r-- 1 harrykar harrykar 1073741824 2009-09-24 16:54 Debian-4.0-AMD64-root_fs

They are nearly the same (The difference between the 1073741824 byte host file and 1032088K is 1.7%.) with the difference probably being the ext3 filesystem overhead. The entire UML filesystem exists in and is confined to that host file. This is another way in which users inside the UML are confined or jailed. A UML user has no way to consume more disk space than is in that host file.

However, on the host, it is possible to extend the filesystem file, and the extra space becomes available to UML. For now, it's just important to note that this is a good example of how much more flexible virtual hardware is in comparison to physical hardware. Try adding extra space to a physical disk or a physical disk partition. You can repartition the disk in order to extend a partition, but that's a nontrivial, angst-ridden operation that potentially puts all of the data on the disk at risk if you make a mistake. You can also add a new volume to the volume group you wish to increase, but this requires that the volume group be set up beforehand and that you have a spare partition to add to it. In comparison, extending a file using dd is a trivial operation that can be done as a normal user, doesn't put any data at risk except that in the file, and doesn't require any prior setup.

We can poke around /proc some more to compare and contrast this virtual machine with the physical host it's running on. For some similarities, let's look at /proc/filesystems:

debian:~#  more /proc/filesystems
nodev sysfs
nodev rootfs
nodev bdev
nodev proc
nodev binfmt_misc
nodev sockfs
nodev pipefs
nodev anon_inodefs
nodev tmpfs
nodev inotifyfs
nodev devpts
reiserfs
ext3
ext2
ext4
nodev ramfs
iso9660
nodev autofs
nodev hostfs
nodev mqueue

There's no sign of any UML oddities here at all. The reason is that the filesystems are not hardware dependent. Anything that doesn't depend on hardware will be exactly the same in UML as on the host. This includes things such as virtual devices (e.g., pseudo-terminals, loop devices, and TUN/TAP [8]network interfaces) and network protocols, as well as the filesystems.
[8] The TUN/TAP driver is a virtual network interface that allows packets to be handled by a process, in order to create a tunnel (the origin of "TUN") or a virtual Ethernet device ("TAP").
So, in order to see something different from the host, we have to look at hardware-specific stuff. For example, /proc/interrupts contains information about all interrupt sources on the system. On the host, it contains information about devices such as the timer, keyboard, and disks. In UML, it looks like this:

debian:~#  more /proc/interrupts
CPU0
0: 1291210 SIGVTALRM timer
2: 130 SIGIO console
3: 0 SIGIO console-write
4: 908 SIGIO ubd
6: 0 SIGIO ssl
7: 0 SIGIO ssl-write
9: 0 SIGIO mconsole
10: 0 SIGIO winch
11: 66 SIGIO write sigio
14: 1 SIGIO random

The timer, keyboard, and disks are here (entries 0, 2 and 6, and 4, respectively), as are a bunch of mysterious-looking entries. The -write entries stem from a weakness in the host Linux SIGIO support. SIGIO is a signal generated when input is available, or output is possible, on a file descriptor. A process wishing to do interrupt-driven I/O would set up SIGIO support on the file descriptors it's using. An interrupt when input is available on a file descriptor is obviously useful. However, an interrupt when output is possible is also sometimes needed.

If a process is writing to a descriptor, such as one belonging to a pipe or a network socket, faster than the process on the other side is reading it, then the kernel will buffer the extra data. However, only a limited amount of buffering is available. When that limit is reached, further writes will fail, returning EAGAIN. It is necessary to know when some of the data has been read by the other side and writes may be attempted again. Here, a SIGIO signal would be very handy. The trouble is that support of SIGIO when output is possible is not universal. Some IPC mechanisms support SIGIO when input is available, but not when output is possible.

In these cases, UML emulates this support with a separate thread that calls poll to wait for output to become possible on these descriptors, interrupting the UML kernel when this happens. The interrupt this generates is represented by one of the -write interrupts.

The other mysterious entry is the winch interrupt. This appears because UML wants to detect when one of its consoles changes size, as when you resize the xterm in which you ran UML. Obviously this is not a concern for the host, but it is for a virtual machine. Because of the interface for registering for SIGWINCH on a host device, a separate thread is created to receive SIGWINCH, and it interrupts UML itself whenever one comes in. Thus, SIGWINCH looks like a separate device from the point of view of /proc/interrupts.

/proc/cpuinfo is interesting:

debian:~#  more /proc/cpuinfo
processor : 0
vendor_id : User Mode Linux
model name : UML
mode : skas
host : Linux harrysas 2.6.28-15-generic #49-Ubuntu SMP Tue Aug 18 19:25:34 UTC 2009 x86_64
bogomips : 444.00

Much of the information in the host's /proc/cpuinfo makes no sense in UML. It contains information about the physical CPU, which UML doesn't have. So, I just put in some information about the host, plus some about the UML itself.


Conclusion
A UML is both very similar to and very different from a physical machine. It is similar as long as you don't look at its hardware. When you do, it becomes clear that you are looking at a virtual machine with virtual hardware. However, as long as you stay away from the hardware, it is very hard to tell that you are inside a virtual machine.

Both the similarities and the differences have advantages. Obviously, having a UML run applications in exactly the same way as on the host is critical for it to be useful. In this section we glimpsed some of the advantages of virtual hardware. Soon we will see that virtualized hardware can be plugged, unplugged, extended, and managed in ways that physical hardware can't. The next section begins to show you what this means.


Exploring UML


Logging In as a Normal User
In this section we will explore a UML instance in more detail, looking at how it is similar to and how it differs from a physical Linux machine. While doing a set of fairly simple, standard system administration chores in the instance, we will see some UML twists to them. For example, we will add swap space and mount filesystems. The twist is that we will do these things by plugging the required devices into the UML at runtime, from the host, without rebooting the UML.
First, let's log in to the UML instance, as we just did. When the UML boots, we see a login prompt in the window in which we started it. Some xterm windows pop up on the screen, which we ignore. They also contain login prompts. We could log in as root, but let's log in as a normal user. first we (as root) must create a brand new user account trough adduser command:

debian:~# adduser harry
Adding user `harry' ...
Adding new group `harry' (1000) ...
Adding new user `harry' (1000) with group `harry' ...
Creating home directory `/home/harry' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for harry
Enter the new value, or press ENTER for the default(optional info can used with e.g.fingerd)
Full Name []: Harry Kar
Room Number []: 987006
Work Phone []: 3336450459
Home Phone []: 987006
Other []: Aprilia 1000 V4
Is the information correct? [y/N] y

debian:~# exit
logout
[ 7671.380000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs
[ 7671.380000] IRQ 10/winch: IRQF_DISABLED is not guaranteed on shared IRQs

Debian GNU/Linux 4.0 debian tty0

debian login: harry
Password:
Linux debian 2.6.31 #1 Thu Sep 24 11:59:30 CEST 2009 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
harry@debian:~$ pwd
/home/harry

This is basically the same as a physical system. In this window, we are a normal, unprivileged user, in a normal home directory. We can test our lack of privileges by trying to do something nasty:

harry@debian:~$ rm -f /bin/ls
rm: cannot remove `/bin/ls': Permission denied


Consoles and Serial Lines
In addition to the xterm consoles that made themselves visible, some others have attached themselves less visibly to other host resources. You can attach UML consoles to almost any host device that can be used for that purpose. For example, they can be (and some, by default, are) attached to host pseudo-terminals(In practice, pseudo-terminals are used for implementing terminal emulators such as xterm(1), in which data read from the pseudo-terminal master is interpreted by the application in the same way a real terminal would interpret the data, and for implementing remote-login programs such as sshd(8), in which data read from the pseudo-terminal master is sent across the network to a client program that is connected to a terminal or terminal emulator). They announce themselves in the kernel log, which we can see by running dmesg:

harry@debian:~$ dmesg | grep "Serial line"
[ 10.610000] Serial line 0 assigned device '/dev/pts/2'
harry@debian:~$ ls /dev/ptmx
/dev/ptmx
harry@debian:~$ tty
/dev/tty0

/dev/ptmx is the "pseudo-terminal master multiplexer" which, when opened, causes a slave node /dev/pts/N node to appear (with N being an integer).

This tells us that one UML serial line has been configured in /etc/inittab (last line here) to have a login prompt on it:
harry@debian:~$ cat /etc/inittab
# /etc/inittab: init(8) configuration.
# $Id: inittab,v 1.91 2002/01/25 13:35:21 miquels Exp $
...
...
# Note that on most Debian systems tty7 is used by the X Window System,
# so if you want to add more getty's go ahead but skip tty7 if you run X.
#
0:2345:respawn:/sbin/getty 38400 tty0
#1:2345:respawn:/sbin/getty 38400 tty1
#2:23:respawn:/sbin/getty 38400 tty2
#3:23:respawn:/sbin/getty 38400 tty3
#4:23:respawn:/sbin/getty 38400 tty4
#5:23:respawn:/sbin/getty 38400 tty5
#6:23:respawn:/sbin/getty 38400 tty6

# Example how to put a getty on a serial line (for a terminal)
#
#T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100
#T1:23:respawn:/sbin/getty -L ttyS1 9600 vt100
s0:1235:respawn:/sbin/getty 115200 ttyS0 linux


The serial line has been configured at the "hardware" level to be attached to a host pseudo-terminal, and it has allocated the host's /dev/pts/2.

Now we can run a terminal program, such as screen or minicom, on the host, attach it to /dev/pts/2, and log in to UML on its one serial line. After running (on host):

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ screen /dev/pts/2

we see a blank screen session. Hitting return gives us another UML login prompt, as advertised:

Debian GNU/Linux 4.0 debian ttyS0

debian login:

Notice the ttyS0 in the banner, in comparison to the tty0 we saw while logging in as root and as user. The tty0 and ttyx devices are UML consoles, while ttyS0 is the first serial line. On a physical machine, the consoles are devices that are displayed on the screen, and the serial lines are ports coming out of the back of the box. There's a clear difference between them.
In contrast, there is almost no difference between the consoles and serial lines in UML. They plug themselves into the console and serial line infrastructures, respectively, in the UML kernel. This is the cause of the different device names. However, in all other ways, they are identical in UML. They share essentially all their code, they can be configured to attach to exactly the same host devices, and they behave in the same ways.
In fact, the serial line driver in UML owes its existence to a UML historical quirk. Because of a limitation in the first implementation of UML, it was impossible to log in on a console in the window in which you ran it. To allow logging in to UML at all, I implemented the serial line driver to connect itself to a host device, and you would attach to this using something like screen.
As time went on and limitations disappeared, I implemented a real console driver. After a while, it dawned on me that there was no real difference between it and the serial line driver, so I started merging the two drivers, making them share more and more code. Now almost the only differences between them are that they plug themselves into different parts of the kernel.
UML consoles and serial lines can be attached to the same devices on the host, and we've seen a console attached to stdin and stdout of the linux process, consoles appearing in xterms, and a serial line attached to a host pseudo-terminal. They can also be attached to host ports, allowing you to telnet to the specified port on the host and log in to the UML from there. This is a convenient way to make a UML accessible from the network without enabling the network within UML.

Finally, UML consoles and serial lines can be attached to host terminals, which can be host consoles, such as /dev/tty*, or the slave side of pseudo-terminals(pts's). Attaching a UML console to a host virtual console has the interesting effect of putting the UML login prompt on the host console, making it appear (to someone not paying sufficient attention) to be the host login.

Let's look at some examples. First, let's attach a console to a host port. We need to find an unused console to work with, so let's use the UML management console tool to query the UML configuration:

harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ uml_mconsole qQhPkE config con0
OK fd:0,fd:1
harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ uml_mconsole qQhPkE config con1
OK xterm
harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ uml_mconsole qQhPkE config con2
OK xterm
harrykar@harrysas:~/works/UML_virt/linux-2.6.31$ uml_mconsole qQhPkE config con3
OK xterm

We will cover the full capabilities of uml_mconsole in a later section, but this gives us an initial look at it. The first argument, qQhPkE, specifies which UML we wish to talk to. A UML can be named and given a unique machine ID(umid). When I ran this UML, I can added umid=debian option to the UML boot command line, giving this instance the name debian. uml_mconsole knows how to use this name to communicate with the debian UML.

If you didn't specify the umid on the command line, UML gives itself a random umid. There are a couple of ways to tell what it chose.
  • First, look through the boot output or output from dmesg for a line that looks like this:
    harry@debian:~$ dmesg | grep "mconsole (version 2)"
    [ 0.200000] mconsole (version 2) initialized on /home/harrykar/.uml/qQhPkE/mconsole
    In this case, the umid is qQhPkE. You can communicate with this instance by using that umid on the uml_mconsole command line.
  • Second, UML puts a directory with the same name as the umid in a special parent directory, by default, ~/.uml. So, you could also look at the subdirectory (At first time, there should be only one) of your ~/.uml directory for the umid to use.
The rest of the uml_mconsole command line is the command to send to the specified UML. In this case, we are asking for the configurations of the first few consoles. Console names start with con; serial line names begin with ssl. I will describe as much of the output format as needed here; Blockquote below contains a more complete and careful description.

A UML console or serial line configuration can consist of separate input and output configurations, or a single configuration for both (input and output). If both are present, they are separated by a colon(the first is input and the second is output). For example, fd:0,fd:1 specifies that console input comes from UML's file descriptor 0 and that output goes to file descriptor 1. In contrast, fd:3 specifies that both input and output are attached to file descriptor 3, which should have been set up on the UML command line with something like 3<>filename.

A single device configuration consists of a device type (fd in the examples above) and device-specific information separated by a colon. The possible device types and additional information are as follows.
  • fd A host file descriptor belonging to the UML process; specify the file descriptor number after the colon.
  • pty A BSD pseudo-terminal; specify the /dev/ptyxx name of the pseudo-terminal you wish to attach the console to. To access it, you will attach a terminal program, such as screen or minicom, to the corresponding /dev/ttyxx file.
  • pts A devpts pseudo-terminal; there is no pts-specific data you need to add. In order to connect to it, you will need to find which pts device it allocated by reading the UML kernel log through dmesg or by using uml_mconsole to query the configuration.
  • port A host port; specify the port number. You access the port by telnetting to it. If you're on the host, you will telnet to localhost:
    host% telnet localhost port-number
    You can also telnet to that port from another machine on the network:
    host% telnet uml-host port-number

  • xterm No extra information needed. This will display an xterm on your screen with the console in it. UML needs a valid DISPLAY environment variable and xterm installed on the host, so this won't work on headless servers. This is the default for consoles other than console 0, so for headless servers, you will need to change this.
  • null No extra information needed. This makes the console available inside UML, but output is ignored and there is never any input. This would be very similar to attaching the console to the host's /dev/null.
  • none No extra information needed. This removes the device from UML, so that attempts to access it will fail with "No such device."
When requesting configuration information through uml_mconsole for pts consoles, it will report the actual device that it allocated after the colon, as follows:

host% uml_mconsole debian config con2
OK pts:/dev/pts/10

The syntax for specifying console and serial line configurations is the same on the UML and uml_mconsole(on host) command lines, except that the UML command line allows giving all devices the same configuration. A specific console or serial line is specified as either con n or ssl n.

On the UML command line, all consoles or serial lines may be given the same configuration with just con= configuration or ssl= configuration. Any specific device configurations that overlap this will override it. So

con=pts con0=fd:0,fd:1

attaches all consoles to pts devices, except for the first one, which is attached to stdin and stdout respectively.

Console input and output can be specified separately. They are completely independent the host device types don't even need to match. For example,

ssl2=pts,xterm

will attach the second serial line's input to a host pts device and the output to an xterm. The effect of this is that when you attach screen or another terminal program to the host pts device, that's the input to the serial line. No output will appear in screenthat will all be directed to the xterm. Most input will also appear in the xterm because that is echoed in the shell.

This can have unexpected effects. Repeating a configuration for both the input and output will, in some cases, attach them to distinct host devices of the same type. For example,

con2=xterm,xterm

will create two xterms one will accept console input, and the other will display the console's output. The same is true for pts.

If we have:

host% uml_mconsole debian config con0
OK fd:0,fd:1
host% uml_mconsole debian config con1
OK none
host% uml_mconsole debian config con2
OK pts:/dev/pts/10
host% uml_mconsole debian config con3
OK pts

  • Looking at the output about the UML configuration, we see an OK on each response, which means that the command succeeded in communicating with the UML and getting a response. The con0 response says that console 0 is attached to stdin and stdout. This bears some explaining, so let's pull apart that response. There are two pieces to it, fd:0 and fd:1, separated by a comma. In a comma-separated configuration like this, the first part refers to input to the console (or serial line), and the second part refers to output from it. The fd:0 part also has two pieces, fd and 0, separated by a colon. fd says that the console input is to be attached to a file descriptor of the linux process, and 0 says that file descriptor will be stdin (file descriptor zero). Similarly, the output is specified to be file descriptor one (stdout).
  • When the console input and output go to the same device, as we can see with con2 being attached to pts:/dev/pts/10, input and output are not specified separately. There is only a single colon-separated device description. As you might have guessed, pts refers to a devpts pseudo-terminal, and /dev/pts/10 tells you specifically which pseudo-terminal the console is attached to.
  • The con1 configuration is one we haven't seen before. It simply says that the console doesn't exist there is no such device.
The configuration for con3 is the one we are looking for. pts says that this is a pts console, and there's no specific pts device listed, so it has not yet been activated by having a UML getty process running on it. We will reconfigure this one to be attached to a host port:

host% uml_mconsole debian config con3=port:9000
OK

port:9000 says that the console should be attached to the host's port 9000, which we will access by telnetting to that port. We can double-check that the change actually happened:

host% uml_mconsole debian config con3
OK port:9000

So far, so good. Let's try telnetting there now:

host% telnet localhost 9000
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused

This failed because UML hasn't run a getty on its console 3. We can fix this by editing its /etc/inittab. Looking there on my machine, I see:

#3:2345:respawn:/sbin/getty 38400 tty3

I had enabled this one in the past but since disabled it. You may not have a tty3 entry at all. You want to end up with a line that looks like this upstairs(uncommented). I'll just uncomment mine; you may have to add the line in its entirety, so fire up your favorite editor on /etc/inittab and fix it. Now, tell init it needs to reread the inittab file:

UML# kill -HUP 1

Let's go back to the host and try the telnet again:

host% telnet localhost 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Fedora Core release 1 (Yarrow)
Kernel 2.4.27 on an i686
Debian GNU/Linux 2.2 usermode tty3
usermode login:

Here we have the UML's console, as advertised. Notice the discrepancy between the telnet banner and the login banner. Telnet is telling us that we are attaching to a Fedora Core 1 (FC1) system running a 2.4.27 kernel, while login is saying that we are attaching to a Debian system. This is because the host is the FC1 system, and telnetd running on the host and attaching us to the host's port 9000 is telling us about the host. There is some abuse of telnetd's capabilities going on in order to allow the redirection of traffic between the host port and UML, and this is responsible for the confusion.

Now, let's stick a UML console on a host console. First, we need to make sure there's no host getty or login running on the chosen console. Looking at my host's /etc/inittab, I see:

6:2345:respawn:/sbin/mingetty tty6

for the last console, and hitting Ctrl-Alt-F6 to switch to that virtual console confirms that a getty is running on it. I'll comment it out, so it looks like this:

#6:2345:respawn:/sbin/mingetty tty6

I tell init to reread inittab:

host# kill -HUP 1

and switch back to that console to make sure it is not being used by the host any more. I now need to make sure that UML can open it:

host% ls -l /dev/tty6
crw------ 1 root root 4, 6 Feb 17 16:26 /dev/tty6

This not being the case, I'll change the permissions so that UML has both read and write access to it:

host# chmod 666 /dev/tty6

After you make any similar changes needed on your own machine, we can tell UML to take over the console. We used the UML tty3 for the host port console, so let's look at tty4:

host% uml_mconsole debian config con4
OK pts

So, let's assign con4 to the host's /dev/tty6 in the usual way:

host% uml_mconsole debian config con4=tty:/dev/tty6
OK

After enabling tty4 in the UML /etc/inittab and telling init to reread the file, we should be able to switch to the host's virtual console 6 and see the UML login prompt. Taken to extremes, this can be somewhat mind bending. Applying this technique to the other virtual con soles results in them all displaying UML, not host, login prompts.

For the security conscious, this sort of redirection and fakery can be valuable. It allows potential attacks on the host to be redirected to a jail, where they can be contained, logged, and analyzed. For the rest of us, it serves as an example of the flexibility of the UML consoles.
Now that we've seen all the ways to access our UML console, it's time to stay logged in on the console and see what we can do inside the UML.


Adding Swap Space
UML is currently running everything in the memory that it has been assigned since it has no swap space. Normal Linux machines have some swap, so let's fix that now. We need some sort of disk to swap onto, and since UML disks are generally host files, we need to make a file on the host to be the swap device:

harrykar@harrysas:~/works/UML_virt$ dd if=/dev/zero of=swap bs=1024 seek=$[ 1024 * 1024 ] count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 3.1332e-05 s, 32.7 MB/s
harrykar@harrysas:~/works/UML_virt$ ls -l swap
-rw-r--r-- 1 harrykar harrykar 1073742848 2009-09-28 02:00 swap


This technique uses dd to create a 1GB sparse file on the host by seeking 1 million 1K blocks(bs) and then writing a 1K block(count) of zeros there. The use of sparse files is pretty standard with UML since it allows host disk space to be allocated only when it is needed. So, this swap device file consumes only 1K of disk space, even though it is technically 1GB in length. We can see the true size, that is, the actual disk space consumption, of the file by adding -s to the ls command line:

harrykar@harrysas:~/works/UML_virt$ ls -ls swap
4 -rw-r--r-- 1 harrykar harrykar 1073742848 2009-09-28 02:00 swap

The 4 in the first column is the number of disk blocks actually occupied by the file. A disk block is 512 bytes, so this file that looks like it's 1GB in length is taking only 2K of disk space.

Now, we need to plug this new file into the UML as an additional block device, which we will do with the management console:

harrykar@harrysas:~/works/UML_virt$ uml_mconsole debian config ubdb=swap
OK

We can check this by asking for the configuration of ubdb in the same way we asked about consoles earlier:

harrykar@harrysas:~/works/UML_virt$ uml_mconsole debian config ubdb
OK /home/harrykar/works/UML_virt/swap


Now, back in the UML, we have a brand-new second block device, so let's set it up for swapping, then swap on it, and look at /proc/meminfo to check our work:

debian:/home/harry# mkswap /dev/ubdb          
Setting up swapspace version 1, size = 1073737 kB
no label, UUID=b0aba245-80ee-4d27-967a-c5714a3a6e3f

debian:/home/harry# swapon /dev/ubdb
[43401.270000] Adding 1048568k swap on /dev/ubdb. Priority:-1 extents:1 across:1048568k

debian:/home/harry# grep Swap /proc/meminfo
SwapCached: 0 kB
SwapTotal: 1048568 kB
SwapFree: 1048568 kB


Let's further check our work by forcing the new swap device to be used. The following command creates a large amount of data by repeatedly converting the contents of /dev/mem (the UML's memory) into readable hex and feeds that into a little perl script that turns it into a very large string. We will use this string to fill up the system's memory and force it into swap.

debian:/home/harry# while true; do od -x /dev/mem ; done | perl -e 'my $s ; while(){ $s .= $_; } print length($s);'

At the same time, let's log in on a second console and watch the free memory disappear:

UML# while true; do free; sleep 10; done

You'll see the system start with almost all of its memory free:

total     used       free   shared   buffers cached  Mem:
126696 21624 105072 0 536 7808
-/+ buffers/cache: 13280 113416
Swap: 1048568 0 1048568

The free memory will start disappearing, until we see a nonzero entry under used for the Swap row:

total     used      free   shared   buffers   cached Mem:
126696 124548 2148 0 76 7244
-/+ buffers/cache: 121823 9468
Swap: 1048568 6524 1042044

Here UML is behaving exactly as any physical system would it is swapping when it is out of memory. Note that the host may have plenty of free memory, but the UML instance is confined to the memory we gave it.


Partitioned Disks
You may have noticed another difference between the way we're using disks in UML and the way they are normally used on a physical machine. We haven't been partitioning them and putting filesystems and swap space on the partitions. This is a consequence of the ease of creating and adding new virtual disks to a virtual machine. With a physical disk, it's much less convenient, and sometimes impossible, to add more disks to a system. Therefore, you want to make the best of what you have, and that means being able to slice a physical disk into partitions that can be treated separately.

When UML was first released, there was no partition support for exactly this reason. I figured there was no need for partitions, given that if you want more disk space in your UML, you just create a new host file for it, and away you go.
This was a mistake. I underestimated the desire of my users to treat their UMLs exactly like their physical machines. In part, this meant they wanted to be able to partition their virtual disks. So, partition support for UML block devices ultimately appeared, and everyone was happy.

However, my original mistake resulted in some naming conventions that can be extremely confusing to a UML newcomer. Initially, UML block devices were referred to by number, for example, ubd0, ubd1, and so on. At first, these numbers corresponded to their minor device numbers, so when you made a device node for ubd1, the command was:

 mknod [OPTION]... NAME TYPE [MAJOR MINOR]
UML# mknod /dev/ubd1 b 98 1

  1. When partition support appeared, this style of device naming was wrong in a couple of respects. First, you want to refer to the partition by number, as with /dev/hda1 or /dev/sdb2. But does ubd10 refer to block device 10 or partition 0 on device 1?
  2. Second, there is support for 16 partitions per device, so each block device gets a chunk of 16 device minor numbers to refer to them. For example, block device 0 has minor numbers 0 through 15, device 1 has minors 16 though 31, and so on. This breaks the previous convention that device numbers correspond to minor numbers, leading people to specify ubd1 on the UML command line and not realize that it has minor device number 16 inside UML.
These two problems led to a naming convention that should have been present from the start. We name ubd devices in the same way as hd or sd devices the disk number is specified with a letter (a, b, c, and so on), and the partition is a number. So, partition 1 on virtual disk 1 is ubdb1. When you add a second disk on the UML command line or via mconsole, it is ubdb, not ubd1. This eliminates the ambiguity of multidigit device numbers and the naming confusion. Here, I will adhere to this convention, although my fingers still use ubd0, ubd1, and so on when I boot UML. In addition, the filesystems I'm using have references to ubd0, so commands such as mount and df will refer to names such as ubd0 rather than ubda.

So, let's partition a ubd device just to see that it's the same as on a physical machine. First, let's make another host file to hold the device and plug it into the UML:

harrykar@harrysas:~/works/UML_virt$ dd if=/dev/zero of=partitioned bs=1024 seek=$[ 1024 * 1024 ] count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 3.0528e-05 s, 33.5 MB/s

harrykar@harrysas:~/works/UML_virt$ uml_mconsole debian config ubdc=partitioned
OK



Now, inside the UML, let's use fdisk to chop this into partitions. Figure 3.2 shows my dialog with fdisk to create two equal-size partitions on this disk.
Figure 3.2. Using fdisk to create two partitions on a virtual disk

usermode:~# fdisk /dev/ubdc
Device contains neither a valid DOS partition table, nor Sun, SGI, or OSF
disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.
Command (m for help): p
Disk /dev/ubdc: 128 heads, 32 sectors, 512 cylinders
Units = cylinders of 4096 * 512 bytes
Device Boot Start End Blocks Id System
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-512, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-512, default 512): 256
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (257-512, default 257):
Using default value 257
Last cylinder or +size or +sizeM or +sizeK (257-512, default 512): 256
Using default value 512
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.
Syncing disks.
usermode:~#

Now, I don't happen to have device nodes for these partitions, so I'll create them:

UML# mknod /dev/ubdc1 b 98 33
UML# mknod /dev/ubdc2 b 98 34

For some variety, let's make one a swap partition and the other a filesystem:

UML# mkswap /dev/ubdc1
Setting up swapspace version 1, size = 536850432 bytes
UML# mke2fs /dev/ubdc2
mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
131072 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=0
8 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376
Writing inode tables: done

And let's put them into action to see that they work as advertised:

UML# swapon /dev/ubdc1
UML# free
total used free shared buffers \
cached
Mem: 125128 69344 55784 0 448 \
49872
-/+ buffers/cache: 19024 106104
Swap: 1572832 0 1572832
UML# mount /dev/ubdc2 /mnt
UML# df
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/ubd0 1032056 259444 720132 26% /
none 62564 0 62564 0% /tmp
/dev/ubdc2 507748 13 481521 0% /mnt

So, we do, in fact, have another 512MB of swap space and a brand-new empty 512MB filesystem.
Rather than calling swapon by hand whenever we want to add some swap space to our UML, we can also just add the device to the UML's /etc/fstab. In our case, the relevant lines would be:

/dev/ubdb swap swap defaults 0 0
/dev/ubdc1 swap swap defaults 0 0

However, if you do this, you must remember to configure the devices on the UML command line since they must be present early in boot when the filesystems are mounted.


UML Disks as Raw Data
Normally, when you add a new block device to a UML, it will be used as either a filesystem or a swap device. However, some other possibilities are also useful with a UML. These work equally well on a physical machine but aren't used because of the lower flexibility of physical disks.
For example, you can copy files into a UML by creating a tar file on the host that contains them, plug that tar file into the UML as a virtual disk, and, inside the UML, untar the files directly from that device. So, on the host, let's create a tar file with some useful files in it:

harrykar@harrysas:~/works/UML_virt$ tar cf etc.tar /etc
tar: Removing leading `/' from member names
tar: /etc/group-: Cannot open: Permission denied
tar: /etc/ppp/pap-secrets: Cannot open: Permission denied
tar: /etc/ppp/chap-secrets: Cannot open: Permission denied
tar: /etc/ppp/peers: Cannot open: Permission denied
tar: /etc/wicd/wireless-settings.conf: Cannot open: Permission denied
tar: /etc/wicd/manager-settings.conf: Cannot open: Permission denied
tar: /etc/wicd/wired-settings.conf: Cannot open: Permission denied
tar: /etc/postgresql/8.3/main/pg_ident.conf: Cannot open: Permission denied
tar: /etc/postgresql/8.3/main/pg_hba.conf: Cannot open: Permission denied
tar: /etc/X11/Xwrapper.config: Cannot open: Permission denied
tar: /etc/ssl/private: Cannot open: Permission denied
tar: /etc/qt3/.qt_plugins_3.3rc.lock: Cannot open: Permission denied
tar: /etc/qt3/.qtrc.lock: Cannot open: Permission denied
tar: /etc/at.deny: Cannot open: Permission denied
tar: /etc/sudoers: Cannot open: Permission denied
tar: /etc/shadow-: Cannot open: Permission denied
tar: /etc/gshadow-: Cannot open: Permission denied
tar: /etc/chatscripts: Cannot open: Permission denied
tar: /etc/fuse.conf: Cannot open: Permission denied
tar: /etc/.pwd.lock: Cannot open: Permission denied
tar: /etc/default/cacerts: Cannot open: Permission denied
tar: /etc/passwd-: Cannot open: Permission denied
tar: /etc/apt/secring.gpg: Cannot open: Permission denied
tar: /etc/apt/trustdb.gpg: Cannot open: Permission denied
tar: /etc/shadow: Cannot open: Permission denied
tar: /etc/firestarter/events-filter-ports: Cannot open: Permission denied
tar: /etc/firestarter/firewall: Cannot open: Permission denied
tar: /etc/firestarter/inbound: Cannot open: Permission denied
tar: /etc/firestarter/user-post: Cannot open: Permission denied
tar: /etc/firestarter/firestarter.sh: Cannot open: Permission denied
tar: /etc/firestarter/sysctl-tuning: Cannot open: Permission denied
tar: /etc/firestarter/events-filter-hosts: Cannot open: Permission denied
tar: /etc/firestarter/configuration: Cannot open: Permission denied
tar: /etc/firestarter/outbound: Cannot open: Permission denied
tar: /etc/firestarter/user-pre: Cannot open: Permission denied
tar: /etc/cups/ssl: Cannot open: Permission denied
tar: /etc/NetworkManager/system-connections/Auto eth1: Cannot open: Permission denied
tar: /etc/gshadow: Cannot open: Permission denied
tar: /etc/security/opasswd: Cannot open: Permission denied
tar: /etc/samba/smbpasswd: Cannot open: Permission denied
tar: Error exit delayed from previous errors


When I did this on my machine, I got a bunch of errors about files that I, as a normal user, couldn't read. Since this is just a demo, that's OK, but if you were really trying to copy your host's /etc into a UML, you'd want to become root in order to get everything.

harrykar@harrysas:~/works/UML_virt$ ls -l etc.tar
-rw-r--r-- 1 harrykar harrykar 12533760 2009-09-28 09:30 etc.tar

I did get about12MB worth of files, so let's plug this tar file into the UML as device number 4, or ubdd:

harrykar@harrysas:~/works/UML_virt$ uml_mconsole debian config ubdd=etc.tar
OK

Now we can untar directly from the device:

harry@debian:~$ su
debian:/home/harry# tar xf /dev/ubdd

debian:/home/harry# ls -l
total 16
drwxr-xr-x 178 root root 12288 Sep 28 08:38 etc
-rw-r--r-- 1 harry harry 1647 Sep 27 10:02 testvim

This technique can also be used to copy a single file into a UML. Simply configure that file as a UML block device and use dd to copy it from the device to a normal file inside the UML filesystem. The draw-back of this approach is that the block device will be an even multiple of the device block size, which is 512 bytes. So, a file whose size is not an even multiple of 512 bytes will have some padding added to it. If this matters, that excess will have to be trimmed in order to make the UML file the same size as the host file.

UML block devices can be attached to anything on the host that can be accessed as a file. Formally, the underlying host file must be seekable. This rules out UNIX sockets, character devices, and named pipes but includes block devices. Devices such as physical disks, partitions, CD-ROMs, DVDs, and floppies can be passed to UML as block devices and accessed from inside as ubd devices. If there is a filesystem on the host block device, it can be mounted inside UML in exactly the same way as on the host, except for the different device name.

The UML must have the appropriate filesystem, either built-in or available as a module. For example, in order to mount a host CD-ROM inside a UML, it must have ISO-9660(
The standard filesystem for a CD) filesystem support.

The properties of the host file show through to the UML device to a great extent. We have already seen that the host file's size determines the size of the UML block device. Permissions also control what can be done inside UML. If the UML user doesn't have write access to the host file, the resulting device will be only mounted read-only.


Networking
Let's take a quick look at networking with UML. This large subject gets much more coverage in later, but here, we will put our UML instance on the network and demonstrate its basic capabilities.

As with all other UML devices, network interfaces are virtual. They are formed from some host network interface that allows processes to send packets either to the host network stack or to another UML instance without involving the host network. Here, we will do the former and communicate with the host.

Processes can send and receive frames from the host in a variety of ways, including TUN/TAP, Ethertap, SLIP, and PPP. [3]All of these, except for PPP, are supported by UML. We will use TUN/TAP since it is intended for this purpose and doesn't have the limitations of the others. TUN/TAP is a driver on the host that creates a pipe, which is essentially a strand of Ethernet, between a process and the host networking system. The host end of this pipe is a network interface, typically named tap, which can be seen using ifconfig just like the system's normal Ethernet device:
[3] SLIP (Serial Line IP) and PPP (Point-to-Point Protocol) are protocols used for dialup Internet access. PPP has largely supplanted SLIP for this purpose. They are useful for UML because they provide virtual network interfaces that allow processes to send and receive network frames.


host% ifconfig tap0
tap0 Link encap:Ethernet HWaddr 00:FF:9F:DF:40:D3
inet addr:192.168.0.254 Bcast:192.168.0.255 \
Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:61 errors:0 dropped:0 overruns:0 frame:0
TX packets:75 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10931 (10.6 Kb) TX bytes:8198 (8.0 Kb)
RX bytes:15771 (15.4 Kb) TX bytes:13466 (13.1 Kb)

This output resulted from a short UML session in which I logged in to the UML from the host, ran a few commands, and logged back out. Thus, the packet counters reflect some network activity.

It looks just like a normal network interface, and, in most respects, it is. It is just not attached to a physical network card. Instead, it is attached to a device file, /dev/net/tun:

harrykar@harrysas:~/works/UML_virt$ ls -l /dev/net/tun
crw------- 1 root root 10, 200 2009-08-06 23:56 /dev/net/tun

This file and the tap0 interface are connected such that any packets routed to tap0 emerge from the /dev/net/tun file and can be read by whatever process has opened it. Conversely, any packets written to this file by a process will emerge from the tap0 interface and be routed to their destination by the host network system. Within UML, there is a similar pipe between this file and the UML Ethernet device. Here is the ifconfig output for the UML eth0 device corresponding to the same short network session as above:

UML# ifconfig eth0
eth0 Link encap:Ethernet HWaddr FE:FD:C0:A8:00:FD
inet addr:192.168.0.253 Bcast:192.168.0.255 \
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:75 errors:0 dropped:0 overruns:0 frame:0
TX packets:61 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
Interrupt:5

Notice that the received and transmitted packet counts are mirror images of each other the number of packets received by the host tap0 interface is the same as the number of packets transmitted by the UML eth0 device. This is because these two interfaces are hooked up to each other back to back, with the connection being made through the host's /dev/net/tun file.
With this bit of theory out of the way, let's put our UML instance on the network. If we look at the interfaces present in our UML, we see only a loopback device, which isn't going to be too useful for us:

debian:/home/harry# ifconfig -a
dummy0 Link encap:Ethernet HWaddr 2E:06:93:9F:7D:37
BROADCAST NOARP MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)


Clearly, this needs to be fixed before we can do any sort of real networking. As you might guess from our previous work, we can simply plug a network device into our UML from the host:

host% uml_mconsole debian config eth0=tuntap,,,192.168.0.254
OK

This uml_mconsole command is telling the UML to create a new eth0 device that will communicate with the host using its TUN/TAP interface, and that the IP address of the host side, the tap0 interface, will be 192.168.0.254. The repeated commas are for parameters we aren't supplying; they will be provided default values by the UML network driver.

My local network uses the 192.168.0.0 network, on which only about the first dozen IP addresses are in regular use. That leaves the upper addresses free for my UML instances. I usually use 192.168.0.254 for the host side of my TUN/TAP interface and 192.168.0.253 for the UML side. When I have multiple instances running, I use 192.168.0.252 and 192.168.0.251, respectively, and so on.
Here, and everywhere else that you put UML instances on the network, you will need to choose IP addresses that work on your local network. They can't already be in use, of course. If suitable IP addresses are in short supply, you may be looking askance at my use of two addresses per UML instance. You can cut this down to one the UML IP address by reusing an IP address for the host side of the TUN/TAP interface. You can reuse the IP address already assigned to your host's eth0 for this and everything will be fine.

Now we can look at the UML network interfaces and see that we have an Ethernet device as well as the previous loopback interface:

UML# ifconfig -a
eth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
Interrupt:5
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0

The eth0 interface isn't running, nor is it configured with an IP address, so we need to fix that:

UML# ifconfig eth0 192.168.0.253 up
* modprobe tun
* ifconfig tap0 192.168.0.254 netmask 255.255.255.255 up
* bash -c echo 1 > /proc/sys/net/ipv4/ip_forward
* route add -host 192.168.0.253 dev tap0
* bash -c echo 1 > /proc/sys/net/ipv4/conf/tap0/proxy_arp
* arp -Ds 192.168.0.253 eth1 pub

This is more output than you normally expect to see from ifconfig, and in fact, it came from the kernel rather than ifconfig. This tells us exactly how the host side of the interface was set up and what commands were used to do it. If there had been any problems, the error output would have shown up here, and this would be the starting point for debugging the problem.
This setup enables the UML to communicate with the world outside the host and configures the host to route packets to and from the UML. In order to get UML on the network with the host, only the first two commands, modprobe and ifconfig, are needed. The modprobe command is precautionary since the host kernel may have TUN/TAP compiled or the tun module already loaded. Once TUN/TAP is available, the tap0 interface is brought up and given an IP address, and it is ready to go.
The bash command tells the host to route packets rather than just dropping packets it receives that aren't intended for it. The route command adds a route to the UML through the tap0 interface. This tells the host that any packet whose destination IP address is 192.168.0.253 (the address we gave to the UML eth0 interface) should be sent to the tap0 interface. Once there, it pops out of the /dev/net/tun file, which the UML network driver is reading, and from there to the UML eth0 interface.
The final two lines set up proxy arp on the host for the UML instance. This causes the instance to be visible, from an Ethernet protocol point of view, on the local LAN. Whenever one Ethernet host wants to send a packet to another, it starts by knowing only the destination IP address. If that address is on the local network, then the host needs to find out what Ethernet address corresponds to that IP address. This is done using Address Resolution Protocol (ARP). The host broadcasts a request on the Ethernet for any host that owns that IP address. The host in question will answer with its hardware Ethernet address, which is all the source host needs in order to build Ethernet frames to hold the IP packet it's trying to send.
Proxy arp tells the host to answer arp requests for the UML IP address just as though it were its own. Thus, any other machine on the network wanting to send a packet to the UML instance will receive an arp response from the UML host. The remote host will send the packet to the UML host, which will forward it through the tap0 interface to the UML instance.
So, the host routing and the proxy arp work together to provide a network path from anywhere on the network to the UML, allowing it to participate on the network just like any other machine.
We can start to see this by using the simplest network tool, ping. First, let's make sure we can communicate with the host by pinging the tap0 interface IP, 192.168.0.254 :

UML# ping 192.168.0.254
PING 192.168.0.254 (192.168.0.254): 56 data bytes
64 bytes from 192.168.0.254: icmp_seq=0 ttl=64 time=2.7 ms
64 bytes from 192.168.0.254: icmp_seq=1 ttl=64 time=0.2 ms

This works fine. For completeness, let's go the other way and ping from the host to the UML:

host% ping 192.168.0.253
PING 192.168.0.253 (192.168.0.253) 56(84) bytes of data.
64 bytes from 192.168.0.253: icmp_seq=0 ttl=64 time=0.130 ms
64 bytes from 192.168.0.253: icmp_seq=1 ttl=64 time=0.069 ms

Now, let's try a different host on the same network:

UML# ping 192.168.0.10
PING 192.168.0.10 (192.168.0.10): 56 data bytes
64 bytes from 192.168.0.10: icmp_seq=0 ttl=63 time=753.2 ms
64 bytes from 192.168.0.10: icmp_seq=1 ttl=63 time=6.3 ms

Here the routing and arping that I described above is coming into play. The other system, 192.168.0.10, believes that the UML host owns the 192.168.0.253 address along with its regular IP and sends packets intended for the UML to it.
Now, let's try something real. Let's log in to the UML from that outside system:

host% ssh user@192.168.0.253
user@192.168.0.253's password:
Linux usermode 2.4.27-1um #6 Sun Jan 23 16:00:39 EST 2005 i686 unknown
Last login: Tue Feb 22 23:05:13 2005 from uml
UML%

Now, except for things like the fact we logged in as user, and the kernel version string and node name, we can't really tell that this isn't a physical machine. This UML is on the network in exactly the same way that all of the physical systems are, and it can participate on the network in all the same ways.







To be continued...

References


User Mode Linux By Dike Jeff
Prentice Hall
ISBN-13: 978-0-13-186505-1

UML home page
clarkson.edu
devloop.org
appunti di informatica(italian)

1 comment:

  1. Hello Harrykar,
    I am the new reader of your blog and i liked your blog.

    I have got great info from this.Will be waiting for more posts.

    Virtualization Technology

    ReplyDelete