Creating a chroot for CentOS 7.3

Creating a chroot for CentOS 7.3

I have recently written some RPM spec files (and to be honest it feels nicer than creating debian packages) and could test installing the resulting packages on a cloud based CentOS 6.8 VM. After that worked I wanted to test the package on a CentOS 7 system as well. To my surprise my cloud platform didn’t have CentOS 7 images. There was RHEL7 with extra computing costs and several CentOS images with extra packages (or “hardening”) that also incurred extra cost.

Being a Debian user for many many years I thought of using something like debootstrap. I initially remembered something called yumbootstrap but the packages/Google hits I found didn’t provide much. I mostly followed another guide and will write down some differences.

$ mkdir -p chroot/var/lib/rpm
$ rpm –rebuilddb –root=$PWD/chroot
$ rpm -i –root=$PWD/chroot –nodeps centos-release-7-3.1611.el7.centos.x86_64.rpm
$ wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

# Create base7 repo
$ echo ”
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7″ > /etc/yum.repos.d/CentOs7.repo

$ yum –disablerepo=\* –enablerepo=base7  –installroot=$PWD/chroot –noplugins install -y rpm-build yum

At that point one can chroot into the new directory. These were enough. I am running this on a CentOS6.8 system so some binaries might fail with the older kernel but I didn’t run into such an issue yet.

Leaving Marnixstraat 311-b

Leaving Marnixstraat 311-b

Long story short. The plan that made me move to Amsterdam didn’t work out and I had to explore what to do next. This means my wonderful time in Amsterdam ends a lot earlier than I had hoped. After exploring different options the most promising leads came through my existing network…but require relocation and leaving my beautiful place at Marnixstraat 311b.

I will miss the atmosphere and way of life of Amsterdam, the beautiful view and the morning sun entering the apartment. What I will not miss and would have liked to avoid are the issues that come from an unresponsive and private owner. His lack of empathy showed early when we kindly asked him to allow us to park our bicycles safely in the storage and were told to park it in our flat. The trajectory was set when being asked to pay a couple of days from the current month regardless if we would move in that day or just at the beginning of the next month. Many issues followed but I will spare you the details.

The climax is his attempt to keep the deposit and try to deny my right to terminate the tenancy agreement. Let’s see how this plays out, my hope in humanity dies last though. The lesson to learn is don’t rush moving from one country to another, if possible pay the deposit to a trustee and finally..

If you rent from a private owner try to get to know him and his reasons to rent one or more apartments. Did he inherit it? Did he live there but had to move? Is it a long term investment to one day move into it once tenants paid his credit? Or is it just about maximizing his profit? It will tell you a lot about how he will deal with a request.

One year in Amsterdam

One year in Amsterdam

Last year I moved from Berlin to Amsterdam and maybe it is time for a quick recap and share my experience. Overall it is very positive. The life is very convenient. Amsterdam has less than a million citizen but it seems as busy as in any other big city. I am staying in the Jordaan with plenty of food options, most things are reachable by foot, some by bicycle. I have super markets around the corner that are open every day of the week and even on some public holidays.

For my workout/gym I found the Healthclub Jordaan which not too far away from where I live and I go there frequently. It is a reasonable sized gym and has all the machines to do the work out and I seldomly have to wait for a machine to be free. Staff is friendly and helpful and the facilities are clean. Employees respond to email quickly and it is quite personal.

One of my re-explored hobbies is to play Tennis and I found the Amsterdam Tennis Academy ran by Ingemar van der Waal. Before I mostly played on clay court but since last year I mostly play indoor at the Frans Otten Stadium. Ingemar is organizing different groups on different levels and I enjoy practicing there. The groups are mixed in gender and nationality and it is quite nice to practice hard and then chat with the other players. When there is an empty spot in another group you might be able to play more frequently than expected. My play and fitness greatly improved and I enjoy my bike ride through the Vondelpark to the courts.

Amsterdam has good public transport with Trams, Buses and Metro. Their contactless ticket system works in Amsterdam, in their national train and in many (all?) different cities. When Amsterdam is too small one can easily take the train to Rotterdam, Utrecht, Eindhoven or other places. Sometimes there is scheduled maintenance so be sure to check online if your train will operate. In general the trains seem to be on time most of the time and most of them have free wifi. And if The Netherlands are too small there is a Thalys to Paris, Lille, ICEs to Germany and with Schiphol an international airport that can be quickly reached by public transport.

I was a visitor to the Amsterdam Medical Centrum (AMC) and it is nicely organized. If you have an appointment at their policlinic and are lost there are people at the entry helping you to find your way. Blood tests are perfectly organized (reminding me of movies where vampires run the blood bank). Wait time is very little as well.

There is a tech scene in Amsterdam with plenty of meetups every week. It is a good way to leave my home office, learn, interact and get some free food. It is a great experience.

And in general most/all people/staff will speak english with you. This ranges from doctor, to pharmacy, to the post office, to the mail man, delivery persons, ordering food. And to conclude so far the experience here is just great!

Funding the Osmocom Cellular project

Funding the Osmocom Cellular project

My friend and business partner has recently blogged about funding of the Osmocom Cellular Infrastructure Projects and while I want to write about the history of sysmocom s.f.m.c. GmbH I will focus on getting contributions (or as a replacement monetary support) for the project.

First of all I think the existence of Osmocom and Osmocom Cellular made a significant difference. It is used to provide connectivity to those previously ignored (Thank you everyone involved with Rhizomatica!) and we enabled mobile communication security research. This ranges from breaking ciphering, hijacking calls, easily fuzzing phones, the whole set of GSM MAP/CAP hacks which lead to real improvement of security and privacy for end users. We took the black out of the mobile black box and want to continue to do it.

My big question is how do we sustain such development (beyond personal sacrifice)? How do we get significant contributions to remove more black boxes and extend to 4G and beyond? If getting contributions is difficult the second best thing seems to be money. This allows to pay and hire new developers that want to spend their work hours on improving Free Software. So where can these contributions come from?

The research/security community

While OsmocomBB and OpenBSC opened up the door for university and corporate researchers to explore networks, offer penetration tests, the project didn’t get much in return though. Part of the problem seems that for research a sloppy modification is enough and when the researcher has published his paper, he is too ashamed to release the hack and moves on.

Universities and Students

Universities used to buy full GSM BTS but recently seem more interested in SDR platforms. While a SDR is not a BTS the promise of running a GSM and LTE network with the same universal radio peripheral is tempting. Fewer BTS sold means less funding for OpenBSC/osmo-bts but this could be easily compensated by increased contributions to osmo-bts and osmo-trx by students and university staff. For some reason this is not happening and I think there are plenty things to improve!

Vendors using OpenBSC and osmo-bts

In general I would expect that BTS vendors that integrate our software with their hardware would have an interest in the longevity of the project and either buy software support or have their staff maintain and contribute fixes. Sadly it seems that with the current state of the industry not contributing is seen as a commercial advantage…

Research grants

The first time I heard of funding of a Free Software project receiving significant funding was when the PyPy project was initiated. Today there are various funds that support Free Software initiatives (NLnet, Mozilla Grants and more) and last year my proposal to NLnet was selected and sysmocom could begin work on 3G support in Osmocom. While this is great, the amount of funding is not enough to keep a company focused on removing blackboxes from mobile communication going for too long. So more and bigger funds are needed.

I tried to get funds from Opentech but they didn’t seem to be interested in projects like replacing proprietary Qualcomm components from modules like the EC20/EC25, or building tools for 2G/3G/4G to allow to educate users on privacy impacts of using cellular technology and to understand how a phone behaves. My first research question would be to explore what really happens when 2G is disabled in a phone and a network tries to force a downgrade. But the proposal would have enabled much more. The proposals were rejected, maybe my proposal was just bad, maybe there is no interest to finance work on cellular technology (besides most data usage seems to be from mobile devices these days). The rejection doesn’t contain feedback so it is hard to tell which of the above is more true.

How can you help?

Maybe there is not enough interest and we should focus our time and energy somewhere else but if you consider our work as important as we do, maybe you can help us? We are looking

  • contributions fix a bug, add a feature, improve existing work and make sure it gets integrated
  • Help us to write project proposals for funds like the Opentech fund…
  • Buy sysmocom hardware?
  • Buy a moral license if your company can/want to do that?
  • Sponsor me (or someone else) and send bitcoin (?)?
  • Propose your idea?
CAMEL and protocol design

CAMEL and protocol design

Today I want to share the pain of running a production 3GPP TCAP/MAP/CAP system and network protocol design in general. The excellent Free Software ASN1/TCAP/MAP/CAP stack (which is made possible by the Pharo live programming environment) I helped creating is in heavy production usage (powering standard off-the-shelf components like a SGSN, an AuC or non-standard components to enable new business cases) and sees roaming traffic from a lot of networks. From time to time something odd comes up.

In TCAP/MAP/CAP messages but also Request/Response and the possible Errors are defined using ASN1. Over the last decades ETSI and 3GPP have made various major versions and minor releases (e.g. adding new optional attributes to requests/responses/errors). The biggest new standard is CAMEL and it is so big and complicated that it was specified in four phases (each phase with their own versions of the ApplicationContext, think of it as an versioned and entry into the definition for all messages and RPC calls).

One issue in supporting a specific module version (application-context-name) is to find the right minor release of 3GPP (either the newest or oldest for that ACN). Then it is a matter to copy and paste the ASN1 definition from either a PDF or a WordDocument into individual files.. and after that is done one can fix the broken imports (or modify the ASN1 parser to make a global look-up) and typos for elements.

This artificial barrier creates two issue for people implementing MAP/CAP using components. Some use inferior ASN1 tools or can’t be bothered to create the input files and decide to hardcode the message content (after all BER/DER is more or less just nested TLV entries). The second issue is related to time/effort as well. When creating the CAMEL ASN1 files I didn’t want to do the work four times (once for each phase) and searched for shortcuts too.

The first issue materialized itself by equipment sending completely broken messages or not sending mandatory(!) elements. So what happens if a big telco sends you a message the stack can’t decode, you look up the oldest and youngest release defining this ACN and see the element that is attempted to be parsed was always mandatory? Right, one adds an OPTIONAL modifier to be able to move forward…

The second issue is on me though. I started with a set of CAMEL phase3 files and assumed that only the operations (and their arguments/response) would be different across different CAMEL phases but the support structs they use would stay the same. My assumption (and this brings us to protocol design) was that besides the versioning of the module they would be conservative and extend supporting types in a forward compatible way and integrated phase2 and phase1 into the same set of files.

And then reality sets in and the logs of the system showed a message that caused an exception during parsing (normally only happens for the first kind of issue). An extension to the Request structure was changed in a not forward compatible way. Let’s have a look:

InitialDPArgExtension ::= SEQUENCE {

-naCarrierInformation [0] NACarrierInformation OPTIONAL,
-gmscAddress [1] ISDN-AddressString OPTIONAL,
+ gmscAddress [0] ISDN-AddressString OPTIONAL,
*more new optional elements*
+ …,
+ enhancedDialledServicesAllowed [11] NULL OPTIONAL,
*more elements after the extension marker*

So one element (naCarrierInformation) got removed and then every following element was renumbered and the extension marker was moved further down. In theory the InitialDPArgExtension name binding exists once in the phase2 to definition and once in phase3 and 3GPP had all rights to define a new binding with different. An engineering question is if this was a good decision?

A change in application-context allows to remove some old cruft and make room for new. The tag space might be considered a scarce resource and making room is saving a resource. On the other hand in the history of GSM no other struct had ran out of tags and there are various other approaches to the problem. The above is already an extension to an extension and the step to an extension of an extension of an extension doesn’t seem so absurd anymore.

So please think of forward compatibility when designing protocols, think of the implementor and make the definition machine readable and please get the imports right so one doesn’t need to resort to a global symbol search. If you are having interesting core network issues related to TCAP, MAP and CAP consider contacting me.

MariaDB Galera and custom health probe for Azure LoadBalancer

MariaDB Galera and custom health probe for Azure LoadBalancer

My Galera set-up on Kubernetes and the Azure LoadBalancer in front of it seem to work nicely but one big TODO is to implement proper health checks. If a node is down, in maintenance or split from the network it should not be part of the LoadBalancer. The Azure LoadBalancer has support for custom HTTP probes and I wanted to write something very simple that handles the HTTP GET, opens a MySQL connection to the destination, check if it is connected to a primary. As this is about health checks the code should be small and reliable.

To improve my Go(-lang) skills I decided to write my healthcheck in Go. And it seemed like a good idea, Go has a powerful HTTP package, a SQL API package and two MySQL implementations. So the entire prototype is just about 72 lines (with comments and empty lines) and I think that qualifies as small. Prototyping the MySQL code took some iterations but in general it went quite quickly. But how reliable is it? Go introduced the nice concept of a context.Context. So any operation should be associated with a context and it should be passed as argument from one method to another. One can create a child context and associate it with a deadline (absolute time) or timeout (relative) and has a way to cancel it.

I grabbed the Context from the HTTP Request, added a timeout and called a function to do the MySQL check. Wow that was easy. Some polish to parse the parameters from the CLI and I am ready to deploy it! But let’s see how reliable it is?

I imagined the following error conditions:

  1. The destination IP is reachable but no one listening on the port. The TCP connection will fail quickly (SYN -> RST,ACK)
  2. The destination IP ends in a blackhole (no RST, ACK) received. One would have a large connect timeout
  3. The Galera node (or machine hosting it) is overloaded. While the connect succeeds the authentication or a query might stall
  4. The Galera node is split and not a master

The first and fourth error conditions are easy to test/simulate and trivial to implement properly. I then moved to the third one. My first choice was to implement an infinitely slow Galera node and did that by using nc -l 3006 to accept a TCP connection and then send nothing. I made a healthprobe and waited… and waited.. no timeout. Not after 2s as programmed in the context, not after 2min and not after.. (okay I gave up after 30 min). Pretty discouraging!

After some reading and browsing I saw an open PR to add context.Context support to the MySQL backend. I modified my import, ran go get to fetch it, go build and retested. Okay that didn’t work either. So let’s try the other MySQL implementation, again change the package imports, go get and go build and retest. I picked the wrong package name but even after picking the right package this driver failed to parse the Database URL. At that point I decided to go back to the first implementation and have a deeper look.

So while many of the SQL API methods take a Context as argument, the Open one does not. Open says it might or might not connect to the database and in case of MySQL it does connect to it. Let’s see if there is a workaround? I could spawn a Go routine and have a selective receive on the result or a timeout. While this would make it possible to respond to the HTTP request it does create two issues. First one can’t cancel Go routines and I would leak memory, but worse I might run into a connection limit of the Galera node. What about other workarounds? It seems I can play with a custom parameter for readTimeout and writeTimeout and at least limit the timeout per I/O operation. I guess it takes a bit of tuning to find good values for a busy system and let’s hope that context.Context will be used more in more places in the future.

Troubleshooting Kubernetes/Azure Storage

Troubleshooting Kubernetes/Azure Storage

In my previous posts I wrote about my set-up of MariaDB Galera on Kubernetes. Now I have some first experience with this set-up and can provide some guidance. I used an ill-fated TCP health-check that lead to MariaDB Galera blocking the originating IPv4 address from accessing the cluster due to never completing a MySQL handshake and it seems (logs are gone) that this lead to the sync between different systems breaking too.

When I woke up my entire cluster was down and didn’t recover. Some pods restarted and I run into a Azure Kubernetes bug where a Persistent Storage would be umounted but not detached. This means the storage can not be re-attached to the new pod. The Microsoft upstream project is a bit hostile but the issue is known. If you are seeing an error about the storage still being detached/attached. You can go to the portal, find the agent that has it attached and detach it by hand.

To bring the cluster back online there is a chicken/egg problem. The discovers the members of the cluster by using environment variables. If the cluster is entirely down and the first pod is starting, it will just exit as it can’t connect to the others. My first approach was to keep the other nodes down and use kubectl edit rc/galera-node-X and set replicas to 0. But then the service is still exporting the information. In the end I deleted the srv/galera-node-X and waited for the first pod to start. Once it was up I could re-create the services again.

My next steps are to add proper health checks, some monitoring and see if there is a more long term archive for the log data of a (deleted) pod.


Starting to use the Galera cluster

Starting to use the Galera cluster

In my previous post I wrote about getting a MariaDB Galera cluster  started on Kubernetes. One of my open issues was how to get my existing VM to connect to it. With Microsoft Azure the first thing is to add Network peering between the Kubernetes cluster and the normal VM network. As previously mentioned the internal IPv4 address of the Galera service is not reachable from outside and the three types of exposing a service are:

  • LoadBalancer
  • ClusterIP
  • NodePort

While the default Microsoft Azure setup already has two LoadBalancers, the kubectl expose –type=LoadBalancer command does not seem to allow me to chose which load balancer to use. So after trying this command my Galera cluster was reachable through a public IPv4 address on the standard MySQL port. While it is password protected it didn’t seem like a good idea. To change the config you can use something like kubectl edit srv/galera-cluster and change the type to another one. Then I tried the NodePort type and got the MySQL port exposed on all masters and thanks to the network peering was able to connect to them directly. Then I manually modified the already configured/created Microsoft Azure LoadBalancer for the three masters to export port 3306 and map it to the internal port. I am also doing a basic health check which checks if port 3306 can be connected to.

Now I can start using the Galera cluster from my container based deployment before migrating it fully to Kubernetes. My next step is probably to improve the health checks to only get primaries listed in the LoadBalancer and then add monitoring to it as well.

Galera on Kubernetes

Galera on Kubernetes

As part of my journey to “cloud” computing I built a service that is using MySQL and as preparation for the initial deployment I set myself the following constraints:

  • Deploy in containers
  • Be able to tolerate some failure of ” VM”s
  • Be able to grow/replace storage without downtime


There are pre-made mariadb:10.1 containers but to not rely on a public registry I have used the Microsoft Azure Container Service to upload my container. The integration into the standard docker tools to create and upload containers just worked. It allows me to give a place for modified containers as well.


With Azure it doesn’t seem possible to online resize (grow) a volume and if I ever want to switch from ext4 to xfs (or zfs?) I should run some form of fault tolerant MySQL to take a node and upgrade it. These days MariaDB 10.1 includes Galera support and besides some systematic issues (which I don’t seem to run in as I have little to no transactions) it seems quite easy to set-up.

Fault tolerance

Fault tolerance comes in a couple flavors. Galera is a multi-master database where the cluster will continue to allow writes as long as there is a majority of active nodes. If I start with three nodes, I can take one off the cluster to maintain.

Kubernetes will reschedule a pod/container to a different machine (“agent”) in case one becomes unhealthy and it will expose the Galera cluster through a LoadBalancer and a single IPv4 address for it. This means only active members of the cluster will be contacted.

The last part is provided by Microsoft Azures availability set. Distributing the Agents into different zones should prevent all of them to go down at the same time during maintenance.

So in theory this looks quite nice, only practice will tell how this will play out.


After having picked Microsoft Azure, Kubernetes and Galera, it is time to set it up. I have started with an example found here. I had to remove some labels to make it work with the current format, moved the container to mariadb:10.1 and modified the default config.

I had to look a bit on how to get persistent storage. I am directly mounting the disk for the pod an alternative is a persistent volume claim. This might be a better approach.

The biggest issue is starting the first service. It requires to pass special parameters to initialize the cluster and involved a round of kubectl edit/kubectl delete to get it up. Having the second and third member join was more easy.


Besides having to gain more experience with it, I do face a couple of problems with this setup and need to explore solutions (or wait for comments?).

I deployed my application before having a Kubernetes cluster and now need to migrate. The default networking of Kubernetes works by adding a lot of masquerading entries on agents and masters. In the cluster these addresses are routable by masquerading but from external they are not reachable. I need to find a way to access it, probably by sacrificing some redundancy first. The other option is to use kubectl expose but I don’t want my cluster to have a public IPv4 address. I need to see how to have an internal load balancer with a private/internal IPv4 address.

Galera cluster management is a bit troubling. The first time I start with a new disk it will not properly connect to the master but would register itself to the LoadBalancer/Service. I manually need to do a kubectl delete of the pod and wait for it to reschedule. That is probably easy to fix. The second part of the problem is that I should use health checks and only register the pod once it has connected and synced to the primaries.

Rolling upgrades seem to have a systematic issue too. The default way for the built-in replication controller looks like a new pod (N+1) will be launched and brought up and then the current galera node will be stopped (back to N). This falls apart with the way I mount the storage/disk. E.g. the new pod can not mount the disk as it is already mounted and the old pod will not be deleted.

Least problematic is auto-scaling. In the example set-up each node is a service by itself, using one persistent disk. It makes scaling the cluster a bit difficult. I can add new nodes and they will discover the master(s) but to have the masters remember the new nodes, I would need to have the pods recycle.


Kubernetes on Microsoft Azure

Kubernetes on Microsoft Azure

The recent Amazon S3 outage should make a strong argument that centralized services have severe issues, technically but from a business point of view as well(you don’t own the destiny of your own product!) and I whole heartily agree with “There is no cloud, it’s only someone else’s computer”. 

Still from time to time I like to see beyond my own nose (and I prefer the German version of that proverb!) and the current exploration involves ReactJS (which I like), Tensorflow (which I don’t have enough time for) and generally looking at Docker/Mesos/Kubernetes to manage services, zero downtime rolling updates. I have browsed and read the documentation over the last year, like the concepts (services, replication controller, pods, agents, masters), planned how to use it but because it doesn’t support SCTP never looked into actually using it.

Microsoft Azure has the Azure Container Services and since end of February it is possible to create Kubernetes clusters. This can be done using the v2 of the Azure CLI or through the portal. I finally decided to learn some new tricks.

Azure asks for a clientId and password and I entered garbage and hoped the necessary accounts would be created. It turns out that the portal is not creating it and also not doing a sanity check of these credentials and second when booting the master it will not properly start. The Microsoft support was very efficient and quick to point that out. I wish the portal would make a sanity check though. So make sure to create a principal first and use it correctly. I ended up creating it on the CLI.

I re-created the cluster and executed kubectl get nodes. It started to look better but one agent was missing from the list of nodes. After logging in I noticed that kubelet was not running. Trying to start it by hand shows that docker.service is missing. Now why it is missing is probably for Microsoft engineering to figure out but the Microsoft support gave me:

sudo rm -rf /var/lib/cloud/instances

sudo cloud-init -d init

sudo cloud-init -d modules -m config

sudo cloud-init -d modules -m final

sudo systemctl restart kubelet

After these commands my system would have a docker.service, kubelet would start and it will be listed as a node. Commands like kubectl expose are well integrated and use a public IPv4 address that is different from the one used for ssh/management. So all in all it was quite easy to get a cluster up and I am sure that some of the hick-ups will be fixed…