Looking for old news? Jump directly to the news archive!

As announced some days ago, I started to hack on a UI for ceofhack.

The first version of fui is now available, which can only print one line of text you entered, but it can do that with ncurses!

Posted Don 04 Mär 2010 23:10:42 CET Tags:

For the lessons of the software development I have to create a nice project in some object orientated language. After digging around what would interest me, I chose ruby, because it feels like it could be an interesting language.

The next question was, what kind of software to write. As the EOF project, namely ceofhack still needs a user interface, I decided to write one.

As I need an user interface, I would use myself, I wanted to use something I can use on the console, which led me to ncurses.

Eventually I found out that there is also support for ncurses in ruby, ncurses-ruby.

Perfectly! The only thing that has been missing was a name. As I am a very simple thinking person, I chose fui, as an abreviation of fancy user interface.

The git repository has already been published, expect more news soon!

Posted Die 02 Mär 2010 22:14:39 CET Tags:

Perhaps you were searching for a way to tunnel the unix socket provided by qemu or kvm for vnc through an ssh tunnel, too? When I was searching for an answer, I found ssvnc, which I did not give a try, because I wanted to solve the problem with socat and ssh.

I started kvm on the remote machine using the following command:

kvm -vnc unix:/home/services/vms/vnc-socket ...

Then I connected socat locally on the remote machine to test the settings:

socat STDIO UNIX-CONNECT:/home/services/vms/vnc-socket

Which worked pretty fine. On the local side we need a listener on a tcp port around port 5500+ (not a must, but the standard vnc port), which can be created like that:

socat STDIO TCP-LISTEN:5500

As reading and writing is not possible with a single pipe, one cannot just do pipe into socat like this:

ssh root@tee.schottelius.org "socat STDIO UNIX-CONNECT:/home/services/vms/vnc-socket" | socat STDIO TCP-LISTEN:5500

But socat has another nice option, the EXEC parameter, which solves the problem:

socat TCP-LISTEN:5500 EXEC:'ssh root@tee.schottelius.org "socat STDIO UNIX-CONNECT:/home/services/vms/vnc-socket"'

And now I can connect locally via tightvnc:

xtightvncviewer -encodings "copyrect tight hextile zlib corre rre raw" localhost:5500

I specify a different order for the encodings, because xtightvncviewer prefers raw "encoding", if it connects to localhost, which is not desired here.

And that's it, vnc connected to a unix socket from kvm tunneled through ssh!

Posted Don 07 Jan 2010 20:32:19 CET Tags:

This report is a about my todays experience with virtual machines.

UML (user-mode-linux)

It began this morning, when I tried to setup a new virtual machine with User mode Linux. I could easily reuse an existing installation using copy-on-write with the following command:

linux umid=vm4 uml_dir=/home/nico/vm/uml con1=pts ubda=/home/nico/vm/cow/vm4,/home/nico/vm/images/debian eth0=tuntap,,,192.168.4.1 mem=4096M

After I issued

apt-get update && apt-get dist-upgrade

in the virtual machine, it hung. It did not react to new ssh connections. I've seen this behaviour quite often with user mode Linux, when I have "a lot" of disk input/output. Ok, I wanted to use some kind of framework for my virtual machines anyway, so for the time being, let's forget about uml and try the libvirt+kvm.

Libvirt

The libvirt project looks quite promising from its documentation, especially in combination with virt-manager. Trying to create a new virtual machine with virt-manager is kind of strange, because it insists of having an installation medium. Though, locating the Debian live CD is not so difficult. But then came the big problem: When I tried to create a new disk image, virt-manager just hung for several minutes, without the host system doing anything. Some time before I had massive problems using virt-manager and selecting a different pool for the images, which caused several problems when trying to start the VM.

But well, let's give virsh a try, the command line utility to manage libvirt. Creating a new disk image with virsh is pretty easy:

vol-create-as default jr.img 8G

A bit confusing is the fact that the vol-create command without -as prefix expects a XML-file as input. Having a look at the other create commands confirms guess:

ikn:~% LANG=C LC_ALL=C virsh help | grep create
   create          create a domain from an XML file
   net-create      create a network from an XML file
   nodedev-create  create a device defined by an XML file on the node
   pool-create     create a pool from an XML file
   pool-create-as  create a pool from a set of args
   vol-create      create a vol from an XML file
   vol-create-from create a vol, using another volume as input
   vol-create-as   create a volume from a set of args
ikn:~% virsh --version
0.7.4

Some commands do not support creation from the command line, but only from an XML-file, which makes virsh useless for interactive and scripting use.

This brings me to the new kid on the block: ganeti

Ganeti

When I first experienced problems with libvirt, some people pointed me to ganeti (to speak truth, it was one of the ganeti developers). Until today I delayed this idea, but after the problems with libvirt I decided to give ganeti-2.0.5-1 (Debian package) a try. First of all I tried to follow the installation tutorial referenced on the homepage, which is heavily orientated on using Xen and LVM, both of them I do not plan to use. Trying to get ganeti running, I was meeting some interesting problems:

[11:26] tee:root# gnt-cluster init ganeti.schottelius.org
Failure: prerequisites not met for this operation:
This host's IP resolves to the private range (127.0.1.1). Please fix DNS or /etc/hosts.

This is described in the ganeti manual and easily fixed by commenting out the relevant entry in /etc/hosts:

[11:27] tee:root# grep tee.schottelius.org /etc/hosts        
#127.0.1.1      tee.schottelius.org     tee

After that I was a bit confused by ganeti not finding its cluster name:

[11:27] tee:root# gnt-cluster init ganeti.schottelius.org
Failure: can't resolve hostname 'ganeti.schottelius.org'
[11:28] tee:root# ping ganeti.schottelius.org
PING ganeti.schottelius.org (77.109.138.195) 56(84) bytes of data.
64 bytes from ganeti.schottelius.org (77.109.138.195): icmp_seq=1 ttl=64 time=0.026 ms
^C                
--- ganeti.schottelius.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms

Retrying two times "solved" the problem, which is a bit confusing for me as ganeti and ping both use the same resolver library. After that I met the "no-lvm-problem":

[11:38] tee:root# gnt-cluster init -b br0 ganeti.schottelius.org
Failure: prerequisites not met for this operation:
Error: volume group 'xenvg' missing
specify --no-lvm-storage if you are not using lvm

Specifying the required parameter led me into a new problem:

[11:38] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org
Failure: prerequisites not met for this operation:
Invalid master netdev given (xen-br0): 'Device "xen-br0" does not exist.'

Which is interesting, ganeti seems to ignore the bridge paramater -b. So, to use ganeti, I renamed the bridge from br0 to xen-br0 in /etc/network/interfaces:

auto xen-br0
iface xen-br0 inet manual
   bridge_ports eth1

And finally I was able to initialise the ganeti cluster:

[15:06] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org

Then I tried to join the host into the cluster, which failed, but retrieving status information also failed:

[15:06] tee:root# gnt-node add tee.schottelius.org
Node tee.schottelius.org already in the cluster (as tee.schottelius.org) - please retry with '--readd'
[15:07] tee:root# gnt-node list                   
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0

Trying to re-add it, results in an error without an error message and does not fix the problem:

[15:07] tee:root# gnt-node add --readd tee.schottelius.org
The authenticity of host 'tee.schottelius.org (77.109.138.222)' can't be established.
RSA key fingerprint is c7:d0:a8:32:ad:f0:9b:fa:1e:77:d5:1f:64:d8:9b:db.
Are you sure you want to continue connecting (yes/no)? yes
Thu Dec 31 15:08:23 2009  - INFO: Readding a node, the offline/drained flags were reset
Thu Dec 31 15:08:23 2009  - INFO: Node will be a master candidate
Failure: command execution error:
[15:08] tee:root# 
[15:32] tee:root# gnt-node list
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0
[15:34] tee:root# 

At that point I was pointed to the more recent documentation of ganeti and began from scratch:

[16:22] tee:vm# gnt-cluster destroy --yes-do-it
[16:23] tee:vm# gnt-cluster init --no-lvm-storage ganeti.schottelius.org
[16:26] tee:vm# gnt-node list
Node                DTotal DFree MTotal MNode MFree Pinst Sinst
tee.schottelius.org      ?     ?      ?     ?     ?     0     0

After double checking that the needed daemons are running (/etc/init.d/ganeti restart), I got a good hint: One has to specify the supervisor to use during initialisation:

[16:34] tee:vm# gnt-cluster destroy --yes-do-it
[16:35] tee:vm# gnt-cluster init --no-lvm-storage -t kvm ganeti.schottelius.org
[16:36] tee:vm# gnt-node list                                                  
Node                DTotal DFree MTotal MNode MFree Pinst Sinst                
tee.schottelius.org      ?     ?  19.6G  2.7G 17.6G     0     0

Now I tried to add a new virtual machine instance, which resulted in another error:

[16:51] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch
Failure: prerequisites not met for this operation:
Hypervisor parameter validation failed on node tee.schottelius.org: Instance kernel '/boot/vmlinuz-2.6-kvmU' not found or not a file                        

This seems to be some kind ganeti logic to have the kernel outside the block device, which is similar to the user mode Linux approach. After linking one of the host kernels and its initrd adding an instance succeeded:

[16:59] tee:/boot# ln -s vmlinuz-2.6.30-2-amd64 vmlinuz-2.6-kvmU
[16:59] tee:/boot# ln -s initrd.img-2.6.30-2-amd64 initrd-2.6-kvmU
[17:00] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch
[17:01] tee:/boot# gnt-instance  list
Instance         Hypervisor OS          Primary_node        Status  Memory
jr.nachtbrand.ch kvm        debootstrap tee.schottelius.org running   128M

It is also correctly connected to the bridge, seen as valid by gnt-os and gnt-cluster verify looks good:

[17:14] tee:/boot# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.000000000000       no              
xen-br0         8000.0015176a26f7       no              eth1
                                                        tap4
[17:16] tee:/boot# gnt-os diagnose
OS: debootstrap [global status: valid]
  Node: tee.schottelius.org, status: VALID (path: /usr/share/ganeti/os/debootstrap)
[17:17] tee:/boot# gnt-cluster verify
Thu Dec 31 17:17:24 2009 * Verifying global settings
Thu Dec 31 17:17:24 2009 * Gathering data (1 nodes)
Thu Dec 31 17:17:24 2009 * Verifying node tee.schottelius.org (master)
Thu Dec 31 17:17:24 2009 * Verifying instance jr.nachtbrand.ch
Thu Dec 31 17:17:24 2009 * Verifying orphan volumes
Thu Dec 31 17:17:24 2009 * Verifying remaining instances
Thu Dec 31 17:17:24 2009 * Verifying N+1 Memory redundancy
Thu Dec 31 17:17:24 2009 * Other Notes
Thu Dec 31 17:17:24 2009   - NOTICE: 1 non-redundant instance(s) found.
Thu Dec 31 17:17:24 2009 * Hooks Results

As specified in the documentation, I tried to connect to the console:

[17:28] tee:/boot# gnt-instance console jr.nachtbrand.ch
[17:30] tee:/boot# gnt-instance console --show-cmd jr.nachtbrand.ch
ssh -q -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oHostKeyAlias=ganeti.schottelius.org -oBatchMode=yes -oStrictHostKeyChecking=yes -t root@tee.schottelius.org '/usr/bin/socat STDIO,echo=0,icanon=0 UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/jr.nachtbrand.ch.serial'

The problem is that the newly debootstrapped system does not have a serial console setup.

As you can see, in the evening of this day I had a lot of new experiences, but no reliable running virtualisation framework. That brings me to the end of this report:

  • User mode Linux does not work reliable under some I/O load.
  • Virt-manager is absolutely not able to change the simplest parameters.
  • Virsh is unusable, if you don't want to edit XML-files.
  • Ganeti has a lot of unhandled problems and still relies very much on Xen + LVM.

As next Monday my vacation ends, I will have a look at the commercial virtualisation frameworks. For the folks of the named FOSS stuff above: Guys, you've to improve a lot, until one can call your software "good and clean software".

Posted Fre 01 Jan 2010 20:25:40 CET Tags:

After I updated one server today from Debian Lenny to Squeeze, puppetd stopped to work and printed the following error:

sgssr240003:~# puppetd --server puppet.inf.ethz.ch  --test --ca_port 8141
warning: peer certificate won't be verified in this SSL session
err: Could not request certificate: Error 405 on SERVER: Method Not Allowed
Exiting; failed to retrieve certificate and watiforcert is disabled

I was a bit confused and did not find useful hints regarding that error message. In the IRC channel #puppet I was told that this can happen, if the puppet client (puppetd) is newer than the puppetmaster.

And indeed, when I compared the versions, puppetmasterd was running version 0.24.8, whereas puppetd was 0.25.1.

After I upgraded puppetmasterd to 0.25.1, it is runs fine again.

If you also have been running into this problem, the article is for you!

Posted Die 01 Dez 2009 23:07:48 CET Tags:

Version 0.3pre19 of cinit-0.3 contains a lot cleanups for the final 0.3 release.

Posted Mon 30 Nov 2009 07:27:06 CET Tags:

As the old key 9885188C expired some time ago, I replaced it. The new key has the following fingerprint

7ED9 F7D3 6B10 81D7 0EC5  5C09 D7DC C8E4 3187 7DF0

and is signed by the previous one. Please resign the new one, if you signed the previous one.

Posted Sam 07 Nov 2009 21:10:07 CET Tags:

This version of cinit, 0.3pre16, contains the new utility cinit-conf.config.shell, which creates a minimal configuration that spawns a shell.

Additionally the Ubuntu migration script (cinit-conf.migrate.upstart.ubuntu.jaunty) is almost finished!

Posted Don 05 Nov 2009 23:03:19 CET Tags:

Some weeks ago I got a good hint from Axel Stefan Beckert, to try conkeror as an alternative for the Firefox browser.

Although I am not used to the emacs shortcuts, it is very good usable with the keyboard only.

In the last few days I have been missing one important feature, one of the most important features of a browser:

to be able to edit a textbox with an external editor

I often edit large wiki pages and rearange them, which is a pain without a real editor.

Conkeror supports external editing, but defaults to

  • $VISUAL
  • $EDITOR
  • or emacs

None of the is usable for me, because $VISUAL and $EDITOR are set to vim and vim requires a terminal.

After I was told on #conkeror to modify ~/.conkerorrc/init.js to include

editor_shell_command = "urxvt -e vim";

it worked like a charm (besides debugging it for some others days, until I found out that there was always one instance of conkeror running, so it never re-read the configuration file). I can now edit textboxes in conkeror with vim!

But then I noticed, that conkeror creates a temporary file below /tmp, which I do not like, because all my data should be put on my encrypted home directory, not on the unencrypted root partition.

So I started to search for a configuration variable in the configuration window, but did not find any hint.

As I am running conkeror from the git source, I began to dig through it and started in modules/external-editor.js, where I found the function open_with_external_editor():

76 function open_with_external_editor (lspec) {
77     keywords(arguments);
78     let [file, temp] = yield download_as_temporary(lspec);
79     yield open_file_with_external_editor(file, $line = arguments.$line, $temporary = temp);
80 }

Ok, what is download_as_temporary() doing? The file modules/save.js helped me:

228 function download_as_temporary (lspec) {
243     var file = get_temporary_file(suggest_file_name(lspec));

Well, well, so what's about the get_temporary_file() function? The file modules/utils.js contains it:

799 function get_temporary_file (name) {
800     if (name == null)
801         name = "temp.txt";
802     var file = file_locator.get("TmpD", Ci.nsIFile);
803     file.append(name);
804     // Create the file now to ensure that no exploits are possible
805     file.createUnique(Ci.nsIFile.NORMAL_FILE_TYPE, 0600);
806     return file;
807 }

Searching for Ci.nsIFile in conkerors source did not reveal many information, so I got back to my seoc (search engine of choice) and found some hints on the mozilla developer center about nsIFile and TmpD and a reference to the IRC channel #extdev.

After I described my problem in that IRC channel, Michael Kaply told me the answer to the question "What defines or where is the TmpD variable defined?":

The temporary directory is OS specific and in my case (unix) defined by the environment variables

  • $TMPDIR
  • $TMP
  • $TEMP (tried in that order)

After I set

TMP=~/.tmp 

and restarted conkeror, pressed C-i in a textbox, the file is eventually saved in the temporary directory .tmp in my home directory!

Posted Don 29 Okt 2009 14:29:17 CET Tags:

Sometimes, when I try to login to a node as root or ldap user, I get this error:

user@myhost.example.org: ssh_exchange_identification: Connection closed by remote host

When I dig into /var/log/syslog and /var/log/auth.log I see that the users are not known to the system anymore:

Oct 26 06:17:01 myhost CRON[24310]: pam_unix(cron:session): session opened for user root by (uid=0)
Oct 26 06:17:01 myhost CRON[24310]: pam_unix(cron:session): session closed for user root
Oct 26 06:25:01 myhost CRON[24349]: pam_unix(cron:account): could not identify user (from getpwnam(root))
Oct 26 06:25:01 myhost CRON[24350]: pam_unix(cron:account): could not identify user (from getpwnam(root))
[...]
Oct 26 08:55:41 myhost sshd[25062]: fatal: Privilege separation user sshd does not exist
Oct 26 09:24:30 myhost sshd[25196]: fatal: Privilege separation user sshd does not exist
Oct 26 09:25:01 myhost CRON[25203]: pam_unix(cron:account): could not identify user (from getpwnam(root))
Oct 26 09:27:45 myhost login[4935]: pam_unix(login:auth): check pass; user unknown

Now comes the interesting part: If I login locally as root, I still cannot login. But if I try it as a ldap user, I can login and after that I can also login locally as root and remotely as everybody again! Those are the logs I see, when logging in locally as user nicosc:

Oct 26 09:27:45 myhost login[4935]: pam_unix(login:auth): check pass; user unknown
Oct 26 09:27:45 myhost login[4935]: pam_unix(login:auth): authentication failure; logname=LOGIN uid=0 euid=0 tty=tty1 ruser= rhost=
Oct 26 09:27:48 myhost login[4935]: FAILED LOGIN (1) on 'tty1' FOR `UNKNOWN', Authentication failure
Oct 26 09:27:50 myhost login[4935]: nss_ldap: could not connect to any LDAP server as cn=inf_proxy,ou=admins,ou=inf,ou=auth,o=ethz,c=ch - Can't contact LDAP server
Oct 26 09:27:50 myhost login[4935]: nss_ldap: failed to bind to LDAP server ldaps://ldaps01.ethz.ch: Can't contact LDAP server
Oct 26 09:27:50 myhost login[4935]: nss_ldap: reconnected to LDAP server ldaps://ldaps02.ethz.ch
Oct 26 09:27:53 myhost login[4935]: pam_env(login:session): Unable to open env file: /etc/default/locale: No such file or directory
Oct 26 09:27:53 myhost login[4935]: pam_unix(login:session): session opened for user nicosc by LOGIN(uid=0)
Oct 26 09:27:53 myhost -bash: nss_ldap: could not connect to any LDAP server as cn=inf_proxy,ou=admins,ou=inf,ou=auth,o=ethz,c=ch - Can't contact LDAP server
Oct 26 09:27:53 myhost -bash: nss_ldap: failed to bind to LDAP server ldaps://ldaps01.ethz.ch: Can't contact LDAP server
Oct 26 09:27:53 myhost -bash: nss_ldap: reconnected to LDAP server ldaps://ldaps02.ethz.ch
Oct 26 09:27:55 myhost login[4935]: pam_unix(login:session): session closed for user nicosc
Oct 26 09:28:02 myhost postfix/pickup[25235]: nss_ldap: could not connect to any LDAP server as cn=inf_proxy,ou=admins,ou=inf,ou=auth,o=ethz,c=ch - Can't contact LDAP server
Oct 26 09:28:02 myhost postfix/pickup[25235]: nss_ldap: failed to bind to LDAP server ldaps://ldaps01.ethz.ch: Can't contact LDAP server
Oct 26 09:28:03 myhost postfix/pickup[25235]: nss_ldap: reconnected to LDAP server ldaps://ldaps02.ethz.ch
Oct 26 09:28:03 myhost sshd[25236]: nss_ldap: could not connect to any LDAP server as cn=inf_proxy,ou=admins,ou=inf,ou=auth,o=ethz,c=ch - Can't contact LDAP server
Oct 26 09:28:03 myhost sshd[25236]: nss_ldap: failed to bind to LDAP server ldaps://ldaps01.ethz.ch: Can't contact LDAP server
Oct 26 09:28:03 myhost sshd[25236]: nss_ldap: reconnected to LDAP server ldaps://ldaps02.ethz.ch
Oct 26 09:28:03 myhost sshd[25236]: Accepted publickey for root from 129.132.130.3 port 52738 ssh2
Oct 26 09:28:03 myhost sshd[25236]: pam_env(sshd:setcred): Unable to open env file: /etc/default/locale: No such file or directory
Oct 26 09:28:03 myhost sshd[25236]: pam_unix(sshd:session): session opened for user root by (uid=0)

I see this happening on Debian Lenny with

  • libnss-ldap-261-2.1
  • libnss3-1d-3.12.3.1-0lenny1
  • libpam0g-1.0.1-5+lenny1
  • openssh-server-1:5.1p1-5

I posted this problem with details

but currently without any helpful hint.

If you have any hint on what could be wrong (i.e. configuration / libs / etc.) or if you are aware of the reason for this behaviour (perfect), please let me know.

Posted Mon 26 Okt 2009 19:27:34 CET Tags: