Looking for old news? Jump directly to the news archive!

Introduction

For a few weeks I am working on my pet project to create a production ready kubernetes cluster that runs in an IPv6 only environment.

As the complexity and challenges for this project are rather interesting, I decided to start documenting them in this blog post.

The ungleich-k8s contanins all snippets and latest code.

Objective

The kubernetes cluster should support the following work loads:

  • Matrix Chat instances (Synapse+postgres+nginx+element)
  • Virtual Machines (via kubevirt)
  • Provide storage to internal and external consumers using Ceph

Components

The following is a list of components that I am using so far. This might change on the way, but I wanted to list already what I selected and why.

OS: Alpine Linux

The operating system of choice to run the k8s cluster is Alpine Linux as it is small, stable and supports both docker and cri-o.

Container management: docker

Originally I started with cri-o. However using cri-o together with kubevirt and calico results in an overlayfs placed on / of the host, which breaks the full host functionality (see below for details).

Docker, while being deprecated, allows me to get kubevirt generally speaking running.

Networking: IPv6 only, calico

I wanted to go with cilium first, because it goes down the eBPF route from the get go. However cilium does not yet contain native and automated BGP peering with the upstream infrastructure, so managing nodes / ip network peering becomes a tedious, manual and error prone task. Cilium is on the way to improve this, but is not there yet.

Calico on the other hand still relies on ip(6)tables and kube-proxy for forwarding traffic, but has for a long time proper BGP support. Calico also aims to add eBPF support, however at the moment it does not support IPv6 yet (bummer!).

Storage: rook

Rook seems to be the first choice if you search who is doing what storage providers in the k8s world. It looks rather proper, even though some knobs are not yet clear to me.

Rook, in my opinion, is a direct alternative of running cephadm, which requires systemd running on your hosts. Which, given Alpine Linux, will never be the case.

Virtualisation

Kubevirt seems to provide a good interface. Mid term, kubevirt is projected to replace OpenNebula at ungleich.

Challenges

cri-o + calico + kubevirt = broken host

So this is a rather funky one. If you deploy cri-o and calico, everything works. If you then deploy kubevirt, the virt-handler pod fails to come up with the error message

 Error: path "/var/run/kubevirt" is mounted on "/" but it is not a shared mount.

In the Internet there are two recommendations to fix this:

  • Fix the systemd unit for docker: Obviously, using neither of them, this is not applicable...
  • Issue mount --make-shared /

The second command has a very strange side effect: Issueing that, the contents of a calico pod are mounted as an overlayfs on / of the host. This covers /proc and thus things like ps, mount and co. fail and basically the whole system becomes unusable until reboot.

This is fully reproducible. I first suspected the tmpfs on / to be the issue, used some disks instead of booting over network to check it and even a regular ext4 on / causes the exact same problem.

docker + calico + kubevirt = other shared mounts

Now, given that cri-o + calico + kubevirt does not lead to the expected result, what does the same setup with docker look like? The calico node pods with docker fail to come up, if /sys is not shared mounted, the virt-handler pods fail if /run is not shared mounted.

Two funky findings:

Issueing the following commands makes both work:

mount --make-shared /sys
mount --make-shared /run

The paths are totally different between docker and cri-o, even though the mapped hostpaths in the pod description are the same. And why is having /sys not being shared not a problem for calico in cri-o?

Log

Status 2021-06-07

Today I have updated the ceph cluster definition in rook to

  • check hosts every 10 minutes instead of 60m for new disks
  • use IPv6 instead of IPv6

The succesful ceph -s output:

[20:42] server47.place7:~/ungleich-k8s/rook# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
  cluster:
    id:     049110d9-9368-4750-b3d3-6ca9a80553d7
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim

  services:
    mon: 3 daemons, quorum a,b,d (age 75m)
    mgr: a(active, since 74m), standbys: b
    osd: 6 osds: 6 up (since 43m), 6 in (since 44m)

  data:
    pools:   2 pools, 33 pgs
    objects: 6 objects, 34 B
    usage:   37 MiB used, 45 GiB / 45 GiB avail
    pgs:     33 active+clean

The result is a working ceph clusters with RBD support. I also applied the cephfs manifest, however RWX volumes (readwritemany) are not yet spinning up. It seems that test helm charts often require RWX instead of RWO (readwriteonce) access.

Also the ceph dashboard does not come up, even though it is configured:

[20:44] server47.place7:~# kubectl -n rook-ceph get svc
NAME                       TYPE        CLUSTER-IP              EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics   ClusterIP   2a0a:e5c0:13:e2::760b   <none>        8080/TCP,8081/TCP   82m
csi-rbdplugin-metrics      ClusterIP   2a0a:e5c0:13:e2::482d   <none>        8080/TCP,8081/TCP   82m
rook-ceph-mgr              ClusterIP   2a0a:e5c0:13:e2::6ab9   <none>        9283/TCP            77m
rook-ceph-mgr-dashboard    ClusterIP   2a0a:e5c0:13:e2::5a14   <none>        7000/TCP            77m
rook-ceph-mon-a            ClusterIP   2a0a:e5c0:13:e2::c39e   <none>        6789/TCP,3300/TCP   83m
rook-ceph-mon-b            ClusterIP   2a0a:e5c0:13:e2::732a   <none>        6789/TCP,3300/TCP   81m
rook-ceph-mon-d            ClusterIP   2a0a:e5c0:13:e2::c658   <none>        6789/TCP,3300/TCP   76m
[20:44] server47.place7:~# curl http://[2a0a:e5c0:13:e2::5a14]:7000
curl: (7) Failed to connect to 2a0a:e5c0:13:e2::5a14 port 7000: Connection refused
[20:45] server47.place7:~#

The ceph mgr is perfectly reachable though:

[20:45] server47.place7:~# curl -s http://[2a0a:e5c0:13:e2::6ab9]:9283/metrics | head

# HELP ceph_health_status Cluster health status
# TYPE ceph_health_status untyped
ceph_health_status 1.0
# HELP ceph_mon_quorum_status Monitors in quorum
# TYPE ceph_mon_quorum_status gauge
ceph_mon_quorum_status{ceph_daemon="mon.a"} 1.0
ceph_mon_quorum_status{ceph_daemon="mon.b"} 1.0
ceph_mon_quorum_status{ceph_daemon="mon.d"} 1.0
# HELP ceph_fs_metadata FS Metadata

Status 2021-06-06

Today is the first day of publishing the findings and this blog article will lack quite some information. If you are curious and want to know more that is not yet published, you can find me on Matrix in the #hacking:ungleich.ch room.

What works so far

  • Spawing pods IPv6 only
  • Spawing IPv6 only services works
  • BGP Peering and ECMP routes with the upstream infrastructure works

Here's an output of the upstream bird process for the routes from k8s:

bird> show route
Table master6:
2a0a:e5c0:13:e2::/108 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server3 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server4 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
                     unicast [place7-server2 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
2a0a:e5c0:13:e1:176b:eaa6:6d47:1c40/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server4 23:45:21.591] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
                     unicast [place7-server3 23:45:21.591] (100) [AS65534i]
        via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server2 23:45:21.589] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
2a0a:e5c0:13:e1:e0d1:d390:343e:8480/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server3 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server4 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
                     unicast [place7-server2 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
2a0a:e5c0:13::/48    unreachable [v6 2021-05-16] * (200)
2a0a:e5c0:13:e1:9b19:7142:bebb:4d80/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server3 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server4 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
                     unicast [place7-server2 2021-06-05] (100) [AS65534i]
        via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
bird>

What doesn't work

  • Rook does not format/spinup all disks
  • Deleting all rook components fails (kubectl delete -f cluster.yaml hangs forever)
  • Spawning VMs fails with error: unable to recognize "vmi.yaml": no matches for kind "VirtualMachineInstance" in version "kubevirt.io/v1"
Posted Sun Jun 6 18:11:50 2021 Tags:

TL;DR

Do not make your software rely on systemd.

Introduction

There is some software out there that is leaning towards requiring systemd. This will render that software unusable on non-systemd Linux distributions. If you develop software, I urge you to not rely on systemd features, because there are many situations in which you cannot use systemd.

The Open Source community

While for many of you systemd might be something you use on a daily basis, there is a big part of the Open Source community that does not use systemd, for a variety of reasons. Without going into detail, systemd does not exist in a variety of Linux distributions like Alpine Linux, Devuan or OpenWrt nor on the BSDs.

However, even if it existed, people might choose to opt-out of the systemd ecosystem because of compatibility, security, stability or any other kind of reason.

Why are we building Open Source Software?

The Open Source / FOSS movement originated many years (decades!) ago with the goal of creating usable systems. Systems that are not locked in, systems that allow you to freely modify software and eventually: support a wider audience, be more inclusive.

Majority is not the right argument

If you assume that everyone has a systemd environment, I need to raise a flag here: you are not the majority. If you are using that line of argument, I will answer with: the majority of systems is running Microsoft Windows, so all software should be written only with Windows in mind. And that is problematic, because you are fully dependent on a single vendor with an ecosystem that one cannot change.

Now, you can argue that systemd is Open Source and it could be modified. While in theory this is true, the systemd authors do have strong opinions that conflict (details omitted here intentionally) with others. In this regard, systemd is similar to a closed ecosystem, because it does not make everyone benefit from it.

Problematic direction

Recently I see some software that assumes the existence of systemd by default. Either by using it as a cgroupdriver or by relying on systemctl. While some software can be patched, the notion in the documenation inclines towards "systemd only support". And that is the reason why I am writing this blog

systemd is not for everyone

You can argue for hours or days whether feature x of systemd is good or not. However it is a fact that systemd is not for everyone and it is not suitable for every situation that Open Source software usually operates in.

Forcing systemd on users does not work (and is not even realistic).

Even if you had the means to try forcing people into systemd, it simply does not work, because it is not suited for running on embedded systems for instance.

Call for action

I am aware that generations of hackers have changed, that Open Source has become much more accessible and that not everyone using Open Source is a hacker anymore. That is not a problem, but actually a significant achievement of the Open Source community. But it also means that we have more diversity and a broader audience.

However we shall not forget our roots and why Open Source Software actually works: it is because we work together and respect different approaches and we try to be inclusive. In terms of systems, as well as humans. That said, I really urge you:

Respect diversity, do not rely on systemd in your software.
Posted Sun May 23 11:36:38 2021 Tags:

For some years I have been seeing problems of nodejs based applications that do not work in IPv6 only networks. More recently, I again found a situation in which a nodejs based application does not even install, if you try to install it in an IPv6 only network.

As the situation is not just straight forward, I started to collect information about it on this website.

The starting point

I wanted to install etherpad-lite and it failed with the following error:

174 error request to https://registry.npmjs.org/express-session/-/express-session-1.17.1.tgz failed, reason: connect EHOSTUNREACH 104.16.25.35:443

The message connect EHOSTUNREACH 104.16.25.35:443 already cleary points to the problem: npm is trying to connect via IPv4 on an IPv6 only VM. This cleary cannot work.

A bug in NPM?

My first suspicion was that it must be a bug in npm. But on Twitter I was told that npm should work in IPv6 only networks. That's strange. However it turns out that somebody else had this problem before and it seems to be specific to using npm on Alpine Linux.

A bug in Alpine Linux?

Alpine Linux is currently the main distribution that I use. Not because of the small libc called musl, but because the whole system works straight forward. Correct. And easy to use. But what does that have to do with etherpad-lite failing to install in an IPv6 only network?

It turns out that there is a difference between musl and glibc in the default behaviour of getaddrinfo(), which is used to retrieve DNS results from the operating system.

A bug in musl libc?

I got in touch with the developers of musl and the statement is rather easy: musl is behaving according to the spec and the caller, in this context nodejs, cannot just use the first result, but has to potentially try all results.

A DNS or a design bug?

And at this stage the problem gets tricky. Let's revise again what I wanted to do and why we are so deep into the rabbit hole.

I wanted to install etherpad-lite, which uses resources from registry.npmjs.org. So npm wants to connect via HTTPS to registry.npmjs.org and download a file. To achieve this, npm has to find out which IP address registry.npmjs.org has. And for this it is doing a DNS lookup.

So far, so good. Now the trouble begins:

A DNS lookup can contain 0, 1 or many answers.

And in case of the libc call getaddrinfo, the result is a list of IPv6 and IPv4 addresses, potentially 0 to many of each.

So an application that "just wants to connect somewhere", cannot just take the first result.

A bug in nodejs?

The assumption at this point is that nodejs only takes the first result from DNS and tries to connect to it. However so far I have not been able to spot the exact source code location to support that claim.

Stay tuned...

Posted Sat Jan 23 09:48:31 2021 Tags:

It's time for the 36c3 and to verify that some things are in place where they should be.

As some of you might know, I am using IPv6 extensively to provide services anywhere on anything, so you will see quite some IPv6 related rules in my configuration.

This post should serve two purpose:

  • Inspire others to verify their network settings prior to the congress
  • Get feedback from anyone spotting a huge mistake in my config :-)

The firewall rules

I am using nftables on my notebook and the full ruleset is shown below.

table ip6 filter {
        chain input {
                type filter hook input priority 0;
                policy drop;

                iif lo accept
                ct state established,related accept

                icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert } accept

                tcp dport { 22, 80, 443 } accept

        }

        chain forward {
                type filter hook forward priority 0;
                policy drop;

                ct state established,related accept

                ip6 daddr 2a0a:e5c1:137:b00::/64  jump container
                ip6 daddr 2a0a:e5c1:137:cafe::/64 jump container
        }

        chain container {
        icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert } accept

                tcp dport { 22, 80, 443 } accept
                drop

        }
        chain output {
                type filter hook output priority 0;
                policy accept;
        }
}

table ip filter {
        chain input {
                type filter hook input priority 0;
                policy drop;

                iif lo accept
                ct state established,related accept
                tcp dport { 22 } accept
                tcp dport { 51820 } accept
        }
        chain forward {
                type filter hook forward priority 0;
                policy drop;
        }
        chain output {
                type filter hook output priority 0;
                policy accept;
        }
}

The firewall explained: IPv6

Let's have a look at the IPv6 part first. In nftables we can freely define chains, what is important is is the hook that we use in it.

        chain input {
                type filter hook input priority 0;
                ...

The policy has the same meaning as in iptables and basically specifies what to do with unmatched packets.

IPv6 uses quite some ICMP6 messages to control and also to establish communication in the first place, so the list for accepting is quite long.

                icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert } accept

As we are dealing with traffic that comes to my notebook ("hook input"), I want to allow any incoming packets that belong to one of the connections that I initiated:

                ct state established,related accept

And finally, I allow port 22, to be able to ssh into my notebook, port 80 to get letsencrypt certificates and port 443 for serving https. When I am online, my notebook is reachable at nico.plays.ipv6.games, so I need the web ports to be open.

As I run quite some test on my notebook with docker and lxc, I created a /64 IPv6 network for each of them. When matching on those specific networks, I jump into a chain that allows specific configurations for containers:

                ip6 daddr 2a0a:e5c1:137:b00::/64  jump container
                ip6 daddr 2a0a:e5c1:137:cafe::/64 jump container

The chain container consists at the moment of the same rule set as the input chain, however this changes occasionally when testing applications in containers.

And for the output chain, I trust that the traffic my notebook emits is what I wanted it to emit (but also allows malware to send out data, if I had some installed).

The firewall explained: IPv4

In the IPv4 irea ("table ip filter*) things are quite similar with some small differences:

  • I don't provides services on IPv4 besides ssh and wireguard (port 22 and 51820)
  • There is nothing to be forwarded for IPv4, all containers use IPv6
  • Same logic for the output as in IPv6

Safe or not safe?

Whether this ruleset is safe or not depends a bit on your degree of paranoia. I allow access to port 443 on which an nginx runs which then again proxies to a self written flask application, which might-or-might-not be safe.

Some people argue to limit outgoing traffic and while this is certainly possible (whitelist ports?), often this does is rendered useless, as any command and control server can be reached on port 80 and you probably don't want to block outgoing port 80 traffic.

If you have any comments about it, I'm interested in hearing your feedback on the ungleich chat, twitter or IRC (telmich).

Update 2019-12-24

I forgot to allow loopback traffic in the original version, which breaks some local networking.

Posted Mon Dec 23 18:23:43 2019 Tags:

Introduction

Dealing with a lot of hardware (in the sense of moving/maintaining) involves some support from vendors. Sometimes vendors are doing a particular bad job. This blog page is dedicated to vendor screwups and document real stories.

The support for a Dell XPS 13 2-in-1 (2019-12-19 - 2020-04-06)

10 days after the repair the space bar inhibts the same behaviour and hangs / does not produce a sign. I cannot open a service request on the website, as the previous request is still open. Additionally now the rubber from the screen falls off. The overall impression of the device is like a cheap 50$ notebook that you buy on shady electronics market, significantly below standard of regular notebooks.

Again a list of "mysterious" calls can be seen in the Dell website, but nobody ever responds on the Dell own website. Meanwhile the support on Twitter tells me to take pictures and a video of the space on Twitter. After I sent all of it, I am asked to reboot the system and try if the space bar still doesn't work in the bootloader.

Updates from this particular Dell fiasco:

  • 2019-12-20: The exchange is confirmed., it should be done in 2-3 work days
  • 2019-12-28: The necessary part (unclear which) is not available, the exchange is postponed to 2020-01-12.
  • 2019-12-31: I get an offer to the the system replaced by a refurbished system. That does not make any sense as the device is almost brand new and a refurbished one has been returned (probably for a good reason).
  • 2020-01-03: I request a statement to how Dell understands "Next business day" support, as the problem has been open for weeks. Again.
  • 2020-01-04: another key ("d") now also gets stuck. It seems the keyboard was never tested to be used in reality to me.
  • 2020-01-06: Dell informs me that the repair is delayed until 2020-01-28.
  • 2020-01-07: I inform Dell that their behaviour is breaking the contract and that I want to have a refund, a replacement or a repair by end of week. This is still significantly longer than "next business day".
  • 2020-01-08: Dell re-informs me that they can exchange with a refurbished device, which I decline before.
  • 2020-01-10: Dell re-informs me that the repair is delayed until 2020-01-28
  • 2020-01-16: Dell informs me that the repair is delayed until 2020-02-24. This makes it an at least 2 months repair time.
  • 2020-01-16: I re-inform Dell that they can refund and pickup the device and that I expect it to be done by 2020-01-24
  • 2020-01-21: Instead of replacing the notebook with a used notebook, Dell today suggest to replace it with a new one. I accept the proposal and now wait for a replacement device.
  • 2020-01-22: I am asked to take a picture of the notebook with the serial tag and to provide the following information: 1. Service request number, 2. Registered Owner's name, 3. Current date and time and 4. Current Location. However the device does not have a sticker and all information is already present at Dell.
  • 2020-01-28: While the replacement notebook should be on the way (according to the tracking it isn't - but then again nothing in the support system of Dell is up-to-date), the current notebook is slowly dying: The screen has become wobbly-wobbly and makes funny noise when opening, closing or even moving the notebook while the display is open. If this notebook was about 7 years old, I'd say it's a typical worn-off problem. However it is about 3 months old now. My hope: it's only this particular model, it's not an issue of the whole XPS 13 series. My fear: it actually looks to be designed rather fragile, compared to a thinkpad. Also note: more random keys get stuck half the way and make it impossible to type text correctly, because a key may-or-may not function on the first hit.
  • 2020-01-31: The date for the replacement is set to be 2020-02-14.
  • 2020-01-31: The notebook begins to further fall apart: the keyboard/lower part slowly disconnects from the screen on the right side. This might also explain the wobbly behaviour. Furthermore the notebook freezes now with some disk I/O. The latter could be a software bug, the former could be a mis-repair (screw lose?). So clearly, if you need to rely on a computer, neither the XPS nor Dell is something to choose.
  • 2020-02-02: The audio jack is now loose and headphones only get partial connectivity. This is probably related to the right part of the screen falling off.
  • 2020-02-03: Can't believe it, but now the touchpad also gets stuck. It is about 0.3mm down on the left side, making it impossible to issue a left click.
  • 2020-02-11: The replacement notebook arrived.
  • 2020-02-16: The replacement notebook gets hand-burning hot at the bottom. Problem reported via Twitter.
  • 2020-02-17: Dell says the hot temperatures are normal, even though I advise Dell that it has potential to burn my skin.
  • 2020-02-19: On the replacement notebook the "c" key gets stuck from time to time.
  • 2020-02-25: The power supply of the replacement notebook is broken. It stops charging after some time, the led on the charger turns off. It works again, if it is disconnected from the power outlet for some hours.
  • 2020-02-26: Running diagnosistics confirms that the charger is broken. This amount of time spent for debugging this notebook series is beyond ridiculous. Dell so far refuses a full refund even though they clearly ship unusable hardware.
  • 2020-02-26: The "h" and "u" keys are now als exhibiting partial stuck behaviour.
  • 2020-02-26: The system gets very slow (mouse pointer lagging slow). I reboot. The system gets stuck in the Dell logo state. Turning it off hard. Turning it on again. It stays stuck with the Dell logo.
  • 2020-04-20: The system has been sent back and refunded.

Summary: Dell is fully incapable of repairing a device and upholding a contract. I assumed I bought a notebook with next business day service. What I got is a computer which has frequent hardware failures and no support within any sensible amount of time.

The support for a Dell XPS 13 2-in-1 (2019-11-16 - 2019-12-09)

I ordered this particular notebook on 2019-09-19 and it arrived around 2019-09-27. So far so good. However shortly after starting to use it, I managed to get a somewhat-stuck key (the "p key"), which more-or-less randomly hangs/does not produce a character. As some of my passwords contain a p, this led to very frustrating login failures.

Having a stuck key like this after less than 2 months of use is really not showing good quality, so I reported this issue with Dell on 2019-11-16. With the device I bought the so called "Complete Care Service" and "Premium Support". In theory reachable 24x7.

In practice, after opening the support request on 2019-11-16, I did not receive a real reply on the following Monday. So I reached out again and got a reply on Tuesday, already being late if it was only next business day (NBD) support.

After reporting that issue additionally the rubber below that keeps the notebook stable on the table began to detach itself from the notebook. Only another minor problem, but clearly nothing to expect from a quality device.

After a long forth-and-back via Twitter DM about the device heat and whether the p key still occasionally is stuck (yes!) there was eventually a replacement scheduled for the 26th of November.

However - you can guess it - nobody showed up. The log at Dell says that somebody tried to reach me, however there was no missed calls on any of my numbers. And no email or no direct message. So even if somebody tried to call, they did not bother sending an email.

Until I reached out again, after I got a message that the phone number is forwarded. It continues "funny" like that: on 26th there was no further communication from Dell. No message, no call, no email.

However when logging in to the Dell portal, Dell rescheduled the appointment for Thursday, 28th, 0800.

Independently on how the story evolves from here, the amount of time spent into the support, waiting, replanning locations, etc. is already exceeding the worth of the product. So I can clearly disrecommend buying this device/support combination, if you want to professionally work with it.

And it continued on 2019-11-27 at around 2230 in the evening when the Dell technician called me by accident. "I just wanted to save your number". Then asking me on the phone where Glarus is in detail. I guess Dell doesn't have a navigation software... Then eventually telling me that he might or might not come tomorrow (the 28th), but he will certainly contact me in the morning.

2019-11-28, around 1400. No call, no message, no nothing. Reaching out via Twitter DM. Again. My phone number is confirmed, I get as an answer. So yet another day where Dell scheduled the support (not me), does not appear, does not reach out nor gives any suitable answer.

2019-11-29. The technician just wrote an email that he comes Monday 1200. That is yet another week after Dell originally announced the repair and yet another time that Dell unilaterally decides on a new repair date without even trying to confirm the date.

But it gets worse: later in the evening I received a twitter message that the case is closed. Without ever having seen a technician, without having gotten it repaired. And a bit later it gets confirmed on the "service request" page of Dell.

So in a summary:

  • Waited for nothing to happen for two weeks
  • Multiple support appointments scheduled without ever showing up
  • Claims of trying to reach me by phone without any missed call (and other calls that I received the same day)
  • Wasted many hours in communication
  • No support executed at all
  • Support cased closed without doing anything

The story continues: 2019-12-04. I got a message saying that the techincian is coming tomorrow. Again without confirmation from my side and with less than 24h to react.

2019-12-05: because so far nobody ever showed up, I send a message via Twitter to @DellHilft, asking about the technician. Answer is that I should wait. The third day. I also check the support center, which claims to have called me at 3 am GMT. 3 am. Seriously, which company does that?

GMT is actually behind Swiss time, so the actual call happend around 2am. Besides all of that, I obviously did not receive a call.

BUT things can get worse with Dell. Since the 5th, my messages in the dell support website don't show up anymore. It basically looks as follows:

Dell: we are calling you
Me: I don't see a call, this is my number:
Dell: we are calling you
Me: Hello? Did you see my message?
Dell: We will just silently drop your messages now

Since 2019-12-05 the "." key is also stuck from time to time. Basically the notebook is falling apart within 2 months of use and the only thing you get is false claims of a technician showing up.

2019-12-06: the technician is calling around 0830. He starts by asking where I live and then tells me it is far away and he doesn't have time for me. He has many other customers. He also sounds very drunk. He tells me he might come on Monday, but cannot tell a time yet.

Also on the same day: I get a note from Dell telling me the technician could not reach me. Not sure how many WTFs can be produced within one day, but Dell is really pushing it to the limits.

2018-12-09: the technician called at 0900, arrived by 1230 and fixed the notebook around 1500.

  • Roughly 4 weeks waiting time
  • Roughly 80+ messages exchanged with Dell
  • 4 working days invested to get it fixed
Posted Tue Nov 26 17:41:26 2019 Tags:

Overview

Due to RAM limitations in most notebooks (16G maximum) I have recently switched to the HP X360 1040 G5, more or less the 14" HP equivalent of the Lenovo X1 Carbon. Some tech specs for the geeks under us:

  • Resolution 3840x2160
  • 1 TB SSD / NVMe
  • 32GB RAM

This article is work in progress, currently more to be seen as a todo list for myself.

Alpine

My backup notebooks are currently running Arch Linux and Devuan. As I find Alpine an interesting project (it resembles most of what I think how Linux should be), I thought about giving it a try.

Some things that are a bit special in alpine Linux:

  • Does not come with shadow by default
  • Uses musl libc instead of glibc (yeah!)

Besides that, some things that are instant benefits of Alpine:

  • easy to use package manager
  • easy to write package format
  • VERY fast package installations (because they are fast)
  • The sound is GREAT (especially compared to the X1 Carbon that does not really have speakers)

What is working on alpine + X360 1040

Almost everything. C'mon, it's 2019 and as long as xorg + i3 is running, what is there more that you want? Some things to emphasise of either:

  • The keyboard is quite nice (actually nicer then Gen6 X1 Carbon)
  • You can run startx via ssh and there is no stupid config that stops you from it!
  • Suspend works even with playing sound, just using pm-suspend + acpid
  • beauty!

What is currently not working on alpine + X360 1040

There are a few minor hiccups that I still need to solve in the next days:

  • create a package for mu4e 1.2 (currently installed in /usr/local)
    • needs fix for /usr/bin/sh reference
    • PR created by eu at https://github.com/alpinelinux/aports/pull/7881/files
    • local install: works!
  • -create a package for magit-
    • M-x package-install magit
  • create a package for vym
  • create a package for openconnect
  • create a package for kismet
  • checkout why the shotwell package is broken
  • checkout why the firefox package is broken
  • hotkeys don't send the right key events => might be a kernel issue
  • xrandr does not show screen connected via usb-c (have to test other outputs)
  • automate lid handling in cdist
    • Currently just created /etc/acpi/LID/0000080 with pm-suspend in it => works
  • The device has a very high frequency sound that varies over time
    • Seems to be unrelated to power plugged in or out
    • Seems to be related to the fan: fan on => no audible high frequency sound
    • The sound is louder than music played at "regular" volume
    • The sounds is directly related to screen brigthness: 100% => no sound
    • The lower the brightness, the stronger the sound

What has been fixed

  • xbacklight
    • Need to load / install the intel video driver (modesetting does not work atm)
Posted Tue May 14 19:10:49 2019 Tags:

Here's a short overview about the changes found in version 4.11.1:

* Core: Improve explorer error reporting (Darko Poljak)
* Type __directory: explorer stat: add support for Solaris (Ander Punnar)
* Type __file: explorer stat: add support for Solaris (Ander Punnar)
* Type __ssh_authorized_keys: Remove legacy code (Ander Punnar)
* Explorer disks: Bugfix: do not break config in case of unsupported OS
  which was introduced in 4.11.0, print message to stderr and empty disk list
  to stdout instead (Darko Poljak)

For more information visit the cdist homepage.

Posted Mon Apr 22 21:14:37 2019 Tags:

Here's a short overview about the changes found in version 4.11.0:

* Type __package: Add __package_apk support (Nico Schottelius)
* Type __directory: Add alpine support (Nico Schottelius)
* Type __file: Add alpine support (Nico Schottelius)
* Type __hostname: Add alpine support (Nico Schottelius)
* Type __locale: Add alpine support (Nico Schottelius)
* Type __start_on_boot: Add alpine support (Nico Schottelius)
* Type __timezone: Add alpine support (Nico Schottelius)
* Type __start_on_boot: gentoo: check all runlevels in explorer (Nico Schottelius)
* New type: __package_apk (Nico Schottelius)
* Type __acl: Add support for ACL mask (Dimitrios Apostolou)
* Core: Fix circular dependency for CDIST_ORDER_DEPENDENCY (Darko Poljak)
* Type __acl: Improve the type (Ander Punnar)
* Explorer interfaces: Simplify code, be more compatible (Ander Punnar)
* Explorer disks: Remove assumable default/fallback, for now explicitly support only Linux and BSDs (Ander Punnar, Darko Poljak)

For more information visit the cdist homepage.

Posted Sat Apr 20 17:16:55 2019 Tags:

Here's a short overview about the changes found in version 4.10.11:

* Core: Fix broken quiet mode (Darko Poljak)
* Build: Add version.py into generated raw source archive (Darko Poljak)
* Explorer disks: Fix detecting disks, fix/add support for BSDs (Ander Punnar)
* Type __file: Fix stat explorer for BSDs (Ander Punnar)
* Type __directory: Fix stat explorer for BSDs (Ander Punnar)

For more information visit the cdist homepage.

Posted Sat Apr 13 19:57:39 2019 Tags:

Here's a short overview about the changes found in version 4.10.10:

* New types: __ufw and __ufw_rule (Mark Polyakov)
* Type __link: Add messaging (Ander Punnar)
* Debugging: Rename debug-dump.sh to cdist-dump (Darko Poljak)
* Documentation: Add cdist-dump man page (Darko Poljak)

For more information visit the cdist homepage.

Posted Thu Apr 11 14:49:55 2019 Tags: