This report is a about my todays experience with virtual machines.

UML (user-mode-linux)

It began this morning, when I tried to setup a new virtual machine with User mode Linux. I could easily reuse an existing installation using copy-on-write with the following command:

linux umid=vm4 uml_dir=/home/nico/vm/uml con1=pts ubda=/home/nico/vm/cow/vm4,/home/nico/vm/images/debian eth0=tuntap,,,192.168.4.1 mem=4096M

After I issued

apt-get update && apt-get dist-upgrade

in the virtual machine, it hung. It did not react to new ssh connections. I've seen this behaviour quite often with user mode Linux, when I have "a lot" of disk input/output. Ok, I wanted to use some kind of framework for my virtual machines anyway, so for the time being, let's forget about uml and try the libvirt+kvm.

Libvirt

The libvirt project looks quite promising from its documentation, especially in combination with virt-manager. Trying to create a new virtual machine with virt-manager is kind of strange, because it insists of having an installation medium. Though, locating the Debian live CD is not so difficult. But then came the big problem: When I tried to create a new disk image, virt-manager just hung for several minutes, without the host system doing anything. Some time before I had massive problems using virt-manager and selecting a different pool for the images, which caused several problems when trying to start the VM.

But well, let's give virsh a try, the command line utility to manage libvirt. Creating a new disk image with virsh is pretty easy:

vol-create-as default jr.img 8G

A bit confusing is the fact that the vol-create command without -as prefix expects a XML-file as input. Having a look at the other create commands confirms guess:

ikn:~% LANG=C LC_ALL=C virsh help | grep create
   create          create a domain from an XML file
   net-create      create a network from an XML file
   nodedev-create  create a device defined by an XML file on the node
   pool-create     create a pool from an XML file
   pool-create-as  create a pool from a set of args
   vol-create      create a vol from an XML file
   vol-create-from create a vol, using another volume as input
   vol-create-as   create a volume from a set of args
ikn:~% virsh --version
0.7.4

Some commands do not support creation from the command line, but only from an XML-file, which makes virsh useless for interactive and scripting use.

This brings me to the new kid on the block: ganeti

Ganeti

When I first experienced problems with libvirt, some people pointed me to ganeti (to speak truth, it was one of the ganeti developers). Until today I delayed this idea, but after the problems with libvirt I decided to give ganeti-2.0.5-1 (Debian package) a try. First of all I tried to follow the installation tutorial referenced on the homepage, which is heavily orientated on using Xen and LVM, both of them I do not plan to use. Trying to get ganeti running, I was meeting some interesting problems:

[11:26] tee:root# gnt-cluster init ganeti.schottelius.org
Failure: prerequisites not met for this operation:
This host's IP resolves to the private range (127.0.1.1). Please fix DNS or /etc/hosts.

This is described in the ganeti manual and easily fixed by commenting out the relevant entry in /etc/hosts:

[11:27] tee:root# grep tee.schottelius.org /etc/hosts        
#127.0.1.1      tee.schottelius.org     tee

After that I was a bit confused by ganeti not finding its cluster name:

[11:27] tee:root# gnt-cluster init ganeti.schottelius.org
Failure: can't resolve hostname 'ganeti.schottelius.org'
[11:28] tee:root# ping ganeti.schottelius.org
PING ganeti.schottelius.org (77.109.138.195) 56(84) bytes of data.
64 bytes from ganeti.schottelius.org (77.109.138.195): icmp_seq=1 ttl=64 time=0.026 ms
^C                
--- ganeti.schottelius.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms

Retrying two times "solved" the problem, which is a bit confusing for me as ganeti and ping both use the same resolver library. After that I met the "no-lvm-problem":

[11:38] tee:root# gnt-cluster init -b br0 ganeti.schottelius.org
Failure: prerequisites not met for this operation:
Error: volume group 'xenvg' missing
specify --no-lvm-storage if you are not using lvm

Specifying the required parameter led me into a new problem:

[11:38] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org
Failure: prerequisites not met for this operation:
Invalid master netdev given (xen-br0): 'Device "xen-br0" does not exist.'

Which is interesting, ganeti seems to ignore the bridge paramater -b. So, to use ganeti, I renamed the bridge from br0 to xen-br0 in /etc/network/interfaces:

auto xen-br0
iface xen-br0 inet manual
   bridge_ports eth1

And finally I was able to initialise the ganeti cluster:

[15:06] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org

Then I tried to join the host into the cluster, which failed, but retrieving status information also failed:

[15:06] tee:root# gnt-node add tee.schottelius.org
Node tee.schottelius.org already in the cluster (as tee.schottelius.org) - please retry with '--readd'
[15:07] tee:root# gnt-node list                   
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0

Trying to re-add it, results in an error without an error message and does not fix the problem:

[15:07] tee:root# gnt-node add --readd tee.schottelius.org
The authenticity of host 'tee.schottelius.org (77.109.138.222)' can't be established.
RSA key fingerprint is c7:d0:a8:32:ad:f0:9b:fa:1e:77:d5:1f:64:d8:9b:db.
Are you sure you want to continue connecting (yes/no)? yes
Thu Dec 31 15:08:23 2009  - INFO: Readding a node, the offline/drained flags were reset
Thu Dec 31 15:08:23 2009  - INFO: Node will be a master candidate
Failure: command execution error:
[15:08] tee:root# 
[15:32] tee:root# gnt-node list
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0
[15:34] tee:root# 

At that point I was pointed to the more recent documentation of ganeti and began from scratch:

[16:22] tee:vm# gnt-cluster destroy --yes-do-it
[16:23] tee:vm# gnt-cluster init --no-lvm-storage ganeti.schottelius.org
[16:26] tee:vm# gnt-node list
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0

After double checking that the needed daemons are running (/etc/init.d/ganeti restart), I got a good hint: One has to specify the supervisor to use during initialisation:

[16:34] tee:vm# gnt-cluster destroy --yes-do-it
[16:35] tee:vm# gnt-cluster init --no-lvm-storage -t kvm ganeti.schottelius.org
[16:36] tee:vm# gnt-node list                                                  
Node                DTotal DFree MTotal MNode MFree Pinst Sinst                
tee.schottelius.org      ?     ?  19.6G  2.7G 17.6G     0     0

Now I tried to add a new virtual machine instance, which resulted in another error:

[16:51] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch
Failure: prerequisites not met for this operation:
Hypervisor parameter validation failed on node tee.schottelius.org: Instance kernel '/boot/vmlinuz-2.6-kvmU' not found or not a file                        

This seems to be some kind ganeti logic to have the kernel outside the block device, which is similar to the user mode Linux approach. After linking one of the host kernels and its initrd adding an instance succeeded:

[16:59] tee:/boot# ln -s vmlinuz-2.6.30-2-amd64 vmlinuz-2.6-kvmU
[16:59] tee:/boot# ln -s initrd.img-2.6.30-2-amd64 initrd-2.6-kvmU
[17:00] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch
[17:01] tee:/boot# gnt-instance  list
Instance         Hypervisor OS          Primary_node        Status  Memory
jr.nachtbrand.ch kvm        debootstrap tee.schottelius.org running   128M

It is also correctly connected to the bridge, seen as valid by gnt-os and gnt-cluster verify looks good:

[17:14] tee:/boot# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.000000000000       no              
xen-br0         8000.0015176a26f7       no              eth1
                                                        tap4
[17:16] tee:/boot# gnt-os diagnose
OS: debootstrap [global status: valid]
  Node: tee.schottelius.org, status: VALID (path: /usr/share/ganeti/os/debootstrap)
[17:17] tee:/boot# gnt-cluster verify
Thu Dec 31 17:17:24 2009 * Verifying global settings
Thu Dec 31 17:17:24 2009 * Gathering data (1 nodes)
Thu Dec 31 17:17:24 2009 * Verifying node tee.schottelius.org (master)
Thu Dec 31 17:17:24 2009 * Verifying instance jr.nachtbrand.ch
Thu Dec 31 17:17:24 2009 * Verifying orphan volumes
Thu Dec 31 17:17:24 2009 * Verifying remaining instances
Thu Dec 31 17:17:24 2009 * Verifying N+1 Memory redundancy
Thu Dec 31 17:17:24 2009 * Other Notes
Thu Dec 31 17:17:24 2009   - NOTICE: 1 non-redundant instance(s) found.
Thu Dec 31 17:17:24 2009 * Hooks Results

As specified in the documentation, I tried to connect to the console:

[17:28] tee:/boot# gnt-instance console jr.nachtbrand.ch
[17:30] tee:/boot# gnt-instance console --show-cmd jr.nachtbrand.ch
ssh -q -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oHostKeyAlias=ganeti.schottelius.org -oBatchMode=yes -oStrictHostKeyChecking=yes -t root@tee.schottelius.org '/usr/bin/socat STDIO,echo=0,icanon=0 UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/jr.nachtbrand.ch.serial'

The problem is that the newly debootstrapped system does not have a serial console setup.

As you can see, in the evening of this day I had a lot of new experiences, but no reliable running virtualisation framework. That brings me to the end of this report:

  • User mode Linux does not work reliable under some I/O load.
  • Virt-manager is absolutely not able to change the simplest parameters.
  • Virsh is unusable, if you don't want to edit XML-files.
  • Ganeti has a lot of unhandled problems and still relies very much on Xen + LVM.

As next Monday my vacation ends, I will have a look at the commercial virtualisation frameworks. For the folks of the named FOSS stuff above: Guys, you've to improve a lot, until one can call your software "good and clean software".