Static binaries (for Go with Docker)

Static binaries (for Go with Docker)

These days Go is quite popular for server based systems (read “cloud”) and one of the nice attributes is that compiling an application results in a single binary with no external dependencies (there is no “runtime” it has to link to). This makes deploying (read “copy to machine”) super easy and is a big contrast to something like Ruby on Rails and its thousands of dependencies. IIRC this feature was attractive to the developers of Qt’s coin (continuous integration agent) as well.

Amusingly in contrast to Rust, Swift or other modern languages the compiler/assembler/linker isn’t powered by LLVM but is based on the Plan9 C compiler which was converted to Go. By setting the GOOS and GOARCH environment variables one can easily cross-compile the binary. E.g. on MacOs build a binary that runs on Linux/ARM64.

When using other system libraries (through cgo) this single binary needs to link to other libraries but this complicates the deployment. The right version and ABI of the library need to be present, if not the application might not start or behaves weirdly. I was in this situation for my tcapflow monitoring utility. I would like to be able to deploy it on any version of RHEL, Ubuntu, Debian without anyone having to install the right libraries.

Here is where musl, Alpine and Docker came to rescue me. Let me briefly elaborate. The dominant C library on GNU/Linux is GNU Libc (glibc) doesn’t support static linking for some good (security, PIE) and some IMHO lazy reasons (PIE could still work, iconv/nss). On the other hand the musl library does support static linking and the C library is quite compatible to glibc and Alpine Linux is a Linux distribution that is using musl instead of glibc. By making use of Alpine I avoid having to build musl and then compiling libpcap and other libraries myself. The final item is Docker. It solves fetching a runnable set of binaries/libraries and setting-up/running a chroot for me. The command line below should result in the alpine container being fetched and an interactive shell prompt coming up. During development I use it to quickly fetch/try the latest version of postgres, mysql, etc.

docker run -it alpine:3.6 /bin/sh

I ended up creating a simple build script that will use the Alpine package manager to install the needed dependencies and then make a static build. The magic for the static build is to pass ldflags to go build which looks like:

go build --ldflags '-linkmode external -extldflags "-static"'

Instead of using a Dockerfile to build a container/image that I will never use (but would still consume disk space) I trigger my compilation through two commands. One to build for i386 and the other for AMD64.

docker run --rm=true -itv $PWD:/mnt alpine:3.6 /mnt/build_static.sh
docker run --rm=true -itv $PWD:/mnt i386/alpine:3.6 /mnt/build_static.sh

In the end I will have two binaries in the out/ directory of my sourcecode. I am using the Binutils objdump to look at the ELF headers of the binary to check which libraries it wants to link to. Shared library dependencies are indicated with NEEDED but in this case there is no such line which means the libpcap dependency was statically linked. For me musl+alpine+docker is the easiest way to build static binaries.

$ objdump -x out/tcapflow-client
out/tcapflow-client:     file format elf32-i386
out/tcapflow-client
architecture: i386, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x000c55b9

Program Header:
LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
filesz 0x004ecf5c memsz 0x004ecf5c flags r-x
LOAD off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**12
filesz 0x0032ea17 memsz 0x0075df34 flags rw-
DYNAMIC off 0x007e2f1c vaddr 0x007e3f1c paddr 0x007e3f1c align 2**2
filesz 0x000000a8 memsz 0x000000a8 flags rw-
NOTE off 0x00000120 vaddr 0x00000120 paddr 0x00000120 align 2**5
filesz 0x00000038 memsz 0x00000038 flags r--
TLS off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**2
filesz 0x00000000 memsz 0x00000004 flags r--
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rw-
RELRO off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**0
filesz 0x002f5374 memsz 0x002f5374 flags r--

Dynamic Section:
SYMBOLIC 0x00000000
INIT 0x000c54ac
FINI 0x0046eed5
GNU_HASH 0x00000158
STRTAB 0x000001d8
SYMTAB 0x00000188
STRSZ 0x00000021
SYMENT 0x00000010
DEBUG 0x00000000
PLTGOT 0x007e3fc4
REL 0x000001fc
RELSZ 0x000c52b0
RELENT 0x00000008
BIND_NOW 0x00000000
FLAGS_1 0x08000001
RELCOUNT 0x00018a56
Brain dump – what fascinates me

Brain dump – what fascinates me

A small brain dump of topics that currently fascinate me. These are mostly pointers and maybe it is interesting to follow it.

Books/Reading:

My kobo ebook reader has the Site Reliability Engineering book and I am now mostly done. It is kind of a revelation and explains my interest to write code but also to operate infrastructure (like struggling with ruby, rmagick, nginx…). I am interested in backends since… well ever. The first time I noticed  it when we talked about Kolab at LinuxTag and I was more interested in the backend than the KDE client. At sysmocom we built an IoT product and the backend was quite some fun, especially the scale of one instance and many devices/users, capacity planning and disk commissioning, lossless upgrades.

It can be seen in my non FOSS SS7 map work on traffic masquerading and transparent rewriting. It is also clear to see which part of engineering is needed for scale (instead of just installing and restarting servers).

Lang VM design

One technology that made Java fast (Hotspot) and has seen its way into JavaScript is dynamic optimization. Most Just in Time Compilers start with generating native code per method, either directly or after the first couple of calls when the methods size is significant enough. The VM records which call paths are hot, which types are used and then can generate optimized code (e.g. specialized for integers, remove type checks). A technique pioneered at Sun for the “Self” language (and then implemented for Strongtalk and then brought to Java) was “adaptive optimization and deoptimization” and was the Phd topic of Urs Hoelzle (Google’s VP of Engineering). One of the key aspects is inlining across method boundaries as this removes method look-up, call stack handling and opens the way for code optimization across method boundaries (at the cost of RAM usage).

In OpenJDK, V8 and JavaScriptCore this adaptive optimization is typically implemented in C++ and requires quite some code. The code is complicated as it needs to optimize but also need to return to a basic function (deoptimize, e.g. if a method changed or the types passed don’t match anymore), e.g. in the middle of a for loop with tons of inlined code (think of Array.map being inlined but then need to be de-inlined). A nice and long blog post of JSC can be found here describing the On Stack Replacement (OSR).

Long introduction and now to the new thing. In the OpensmalltalkVM a new approach called Sista has been picked and I find it is genius. Like with many problems the point of view and approach really matters. Instead of writing a lot of code in the VM the optimizer runs next to the application code. The key parts seem to be:

  • Using branches taken/not-taken as indicator how hot a path is. The overhead of counting these seem to be better than counting method calls/instructions/loops.
  • Using the Inline Caches for type information on call sites (is that mono-, poly- or megamorphic?)
  • Optimize from one set of Bytecode to another set of Bytecode.

The revelation is the last part. By just optimizing from bytecode to bytecode the VM remains in charge of creating and managing machine code. The next part is that tooling in the higher language is better or at least the roundtrip is more quick (edit code and just compile the new method instead of running make, c++, ld). The productivity thanks to the abstraction and tooling is likely higher.

As last part the OSR is easier as well. In Smalltalk thisContext (the current stack frame, activation record) is an object as well. At the right point (when the JIT has either written back variables from register to the stack or at least knows where the value is) one can just manipulate thisContext, create and link news ones and then resume execution without all the magic in other VMs.

Go, Go and escape analysis

Ken Thompson and Robert Pike are well known persons and their Go programming language is a very interesting system programming language. Like with all new languages I try to get real world experience with the language, the tooling and which kind of problems can be solved with it. I have debugged and patched some bigger projects and written two small applications with it.

There is plenty I like. The escape analysis of the compiler is fun (especially now that I know it was translated from the Plan9 C compiler from C to Go), the concurrency model is good (though allowing shared state), the module system makes sense (but makes forking harder than necessary), being able to cross compile to any target from any system.

Knowing a bit of Erlang (and continuing to read the Phd Thesis of Joe Armstrong) and being a heavy Smalltalk user there are plenty of things missing. It starts with vague runtime error messages (e.g. panicslice not having parameters) and goes to runtime and post-runtime inspection. In Smalltalk thanks to the abstraction a lot of hard things are easy and I would have wished for some of them to be in Go. Serialize all unrecovered panics? Debugging someone else’s code seems like pre 1980…

So for many developers Go is a big improvement but for some people with a wider view it might look like a lost opportunity. But that can only be felt by developers that have experienced higher abstraction and productivity.

 

Unsupervised machine learning

but that is for another dump…

Creating a chroot for CentOS 7.3

Creating a chroot for CentOS 7.3

I have recently written some RPM spec files (and to be honest it feels nicer than creating debian packages) and could test installing the resulting packages on a cloud based CentOS 6.8 VM. After that worked I wanted to test the package on a CentOS 7 system as well. To my surprise my cloud platform didn’t have CentOS 7 images. There was RHEL7 with extra computing costs and several CentOS images with extra packages (or “hardening”) that also incurred extra cost.

Being a Debian user for many many years I thought of using something like debootstrap. I initially remembered something called yumbootstrap but the packages/Google hits I found didn’t provide much. I mostly followed another guide and will write down some differences.

$ mkdir -p chroot/var/lib/rpm
$ rpm –rebuilddb –root=$PWD/chroot
$ rpm -i –root=$PWD/chroot –nodeps centos-release-7-3.1611.el7.centos.x86_64.rpm
$ wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7 http://mirror.centos.org/centos/7/os/x86_64/RPM-GPG-KEY-CentOS-Testing-7

# Create base7 repo
$ echo ”
[base7]
name=CentOS7
baseurl=http://mirror.centos.org/centos/7/os/x86_64/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7″ > /etc/yum.repos.d/CentOs7.repo

$ yum –disablerepo=\* –enablerepo=base7  –installroot=$PWD/chroot –noplugins install -y rpm-build yum

At that point one can chroot into the new directory. These were enough. I am running this on a CentOS6.8 system so some binaries might fail with the older kernel but I didn’t run into such an issue yet.

Leaving Marnixstraat 311-b

Leaving Marnixstraat 311-b

Long story short. The plan that made me move to Amsterdam didn’t work out and I had to explore what to do next. This means my wonderful time in Amsterdam ends a lot earlier than I had hoped. After exploring different options the most promising leads came through my existing network…but require relocation and leaving my beautiful place at Marnixstraat 311b.

I will miss the atmosphere and way of life of Amsterdam, the beautiful view and the morning sun entering the apartment. What I will not miss and would have liked to avoid are the issues that come from an unresponsive and private owner. His lack of empathy showed early when we kindly asked him to allow us to park our bicycles safely in the storage and were told to park it in our flat. The trajectory was set when being asked to pay a couple of days from the current month regardless if we would move in that day or just at the beginning of the next month. Many issues followed but I will spare you the details.

The climax is his attempt to keep the deposit and try to deny my right to terminate the tenancy agreement. Let’s see how this plays out, my hope in humanity dies last though. The lesson to learn is don’t rush moving from one country to another, if possible pay the deposit to a trustee and finally..

If you rent from a private owner try to get to know him and his reasons to rent one or more apartments. Did he inherit it? Did he live there but had to move? Is it a long term investment to one day move into it once tenants paid his credit? Or is it just about maximizing his profit? It will tell you a lot about how he will deal with a request.

One year in Amsterdam

One year in Amsterdam

Last year I moved from Berlin to Amsterdam and maybe it is time for a quick recap and share my experience. Overall it is very positive. The life is very convenient. Amsterdam has less than a million citizen but it seems as busy as in any other big city. I am staying in the Jordaan with plenty of food options, most things are reachable by foot, some by bicycle. I have super markets around the corner that are open every day of the week and even on some public holidays.

For my workout/gym I found the Healthclub Jordaan which not too far away from where I live and I go there frequently. It is a reasonable sized gym and has all the machines to do the work out and I seldomly have to wait for a machine to be free. Staff is friendly and helpful and the facilities are clean. Employees respond to email quickly and it is quite personal.

One of my re-explored hobbies is to play Tennis and I found the Amsterdam Tennis Academy ran by Ingemar van der Waal. Before I mostly played on clay court but since last year I mostly play indoor at the Frans Otten Stadium. Ingemar is organizing different groups on different levels and I enjoy practicing there. The groups are mixed in gender and nationality and it is quite nice to practice hard and then chat with the other players. When there is an empty spot in another group you might be able to play more frequently than expected. My play and fitness greatly improved and I enjoy my bike ride through the Vondelpark to the courts.

Amsterdam has good public transport with Trams, Buses and Metro. Their contactless ticket system works in Amsterdam, in their national train and in many (all?) different cities. When Amsterdam is too small one can easily take the train to Rotterdam, Utrecht, Eindhoven or other places. Sometimes there is scheduled maintenance so be sure to check online if your train will operate. In general the trains seem to be on time most of the time and most of them have free wifi. And if The Netherlands are too small there is a Thalys to Paris, Lille, ICEs to Germany and with Schiphol an international airport that can be quickly reached by public transport.

I was a visitor to the Amsterdam Medical Centrum (AMC) and it is nicely organized. If you have an appointment at their policlinic and are lost there are people at the entry helping you to find your way. Blood tests are perfectly organized (reminding me of movies where vampires run the blood bank). Wait time is very little as well.

There is a tech scene in Amsterdam with plenty of meetups every week. It is a good way to leave my home office, learn, interact and get some free food. It is a great experience.

And in general most/all people/staff will speak english with you. This ranges from doctor, to pharmacy, to the post office, to the mail man, delivery persons, ordering food. And to conclude so far the experience here is just great!

Funding the Osmocom Cellular project

Funding the Osmocom Cellular project

My friend and business partner has recently blogged about funding of the Osmocom Cellular Infrastructure Projects and while I want to write about the history of sysmocom s.f.m.c. GmbH I will focus on getting contributions (or as a replacement monetary support) for the project.

First of all I think the existence of Osmocom and Osmocom Cellular made a significant difference. It is used to provide connectivity to those previously ignored (Thank you everyone involved with Rhizomatica!) and we enabled mobile communication security research. This ranges from breaking ciphering, hijacking calls, easily fuzzing phones, the whole set of GSM MAP/CAP hacks which lead to real improvement of security and privacy for end users. We took the black out of the mobile black box and want to continue to do it.

My big question is how do we sustain such development (beyond personal sacrifice)? How do we get significant contributions to remove more black boxes and extend to 4G and beyond? If getting contributions is difficult the second best thing seems to be money. This allows to pay and hire new developers that want to spend their work hours on improving Free Software. So where can these contributions come from?

The research/security community

While OsmocomBB and OpenBSC opened up the door for university and corporate researchers to explore networks, offer penetration tests, the project didn’t get much in return though. Part of the problem seems that for research a sloppy modification is enough and when the researcher has published his paper, he is too ashamed to release the hack and moves on.

Universities and Students

Universities used to buy full GSM BTS but recently seem more interested in SDR platforms. While a SDR is not a BTS the promise of running a GSM and LTE network with the same universal radio peripheral is tempting. Fewer BTS sold means less funding for OpenBSC/osmo-bts but this could be easily compensated by increased contributions to osmo-bts and osmo-trx by students and university staff. For some reason this is not happening and I think there are plenty things to improve!

Vendors using OpenBSC and osmo-bts

In general I would expect that BTS vendors that integrate our software with their hardware would have an interest in the longevity of the project and either buy software support or have their staff maintain and contribute fixes. Sadly it seems that with the current state of the industry not contributing is seen as a commercial advantage…

Research grants

The first time I heard of funding of a Free Software project receiving significant funding was when the PyPy project was initiated. Today there are various funds that support Free Software initiatives (NLnet, Mozilla Grants and more) and last year my proposal to NLnet was selected and sysmocom could begin work on 3G support in Osmocom. While this is great, the amount of funding is not enough to keep a company focused on removing blackboxes from mobile communication going for too long. So more and bigger funds are needed.

I tried to get funds from Opentech but they didn’t seem to be interested in projects like replacing proprietary Qualcomm components from modules like the EC20/EC25, or building tools for 2G/3G/4G to allow to educate users on privacy impacts of using cellular technology and to understand how a phone behaves. My first research question would be to explore what really happens when 2G is disabled in a phone and a network tries to force a downgrade. But the proposal would have enabled much more. The proposals were rejected, maybe my proposal was just bad, maybe there is no interest to finance work on cellular technology (besides most data usage seems to be from mobile devices these days). The rejection doesn’t contain feedback so it is hard to tell which of the above is more true.

How can you help?

Maybe there is not enough interest and we should focus our time and energy somewhere else but if you consider our work as important as we do, maybe you can help us? We are looking

  • contributions fix a bug, add a feature, improve existing work and make sure it gets integrated
  • Help us to write project proposals for funds like the Opentech fund…
  • Buy sysmocom hardware?
  • Buy a moral license if your company can/want to do that?
  • Sponsor me (or someone else) and send bitcoin (?)?
  • Propose your idea?
CAMEL and protocol design

CAMEL and protocol design

Today I want to share the pain of running a production 3GPP TCAP/MAP/CAP system and network protocol design in general. The excellent Free Software ASN1/TCAP/MAP/CAP stack (which is made possible by the Pharo live programming environment) I helped creating is in heavy production usage (powering standard off-the-shelf components like a SGSN, an AuC or non-standard components to enable new business cases) and sees roaming traffic from a lot of networks. From time to time something odd comes up.

In TCAP/MAP/CAP messages but also Request/Response and the possible Errors are defined using ASN1. Over the last decades ETSI and 3GPP have made various major versions and minor releases (e.g. adding new optional attributes to requests/responses/errors). The biggest new standard is CAMEL and it is so big and complicated that it was specified in four phases (each phase with their own versions of the ApplicationContext, think of it as an versioned and entry into the definition for all messages and RPC calls).

One issue in supporting a specific module version (application-context-name) is to find the right minor release of 3GPP (either the newest or oldest for that ACN). Then it is a matter to copy and paste the ASN1 definition from either a PDF or a WordDocument into individual files.. and after that is done one can fix the broken imports (or modify the ASN1 parser to make a global look-up) and typos for elements.

This artificial barrier creates two issue for people implementing MAP/CAP using components. Some use inferior ASN1 tools or can’t be bothered to create the input files and decide to hardcode the message content (after all BER/DER is more or less just nested TLV entries). The second issue is related to time/effort as well. When creating the CAMEL ASN1 files I didn’t want to do the work four times (once for each phase) and searched for shortcuts too.

The first issue materialized itself by equipment sending completely broken messages or not sending mandatory(!) elements. So what happens if a big telco sends you a message the stack can’t decode, you look up the oldest and youngest release defining this ACN and see the element that is attempted to be parsed was always mandatory? Right, one adds an OPTIONAL modifier to be able to move forward…

The second issue is on me though. I started with a set of CAMEL phase3 files and assumed that only the operations (and their arguments/response) would be different across different CAMEL phases but the support structs they use would stay the same. My assumption (and this brings us to protocol design) was that besides the versioning of the module they would be conservative and extend supporting types in a forward compatible way and integrated phase2 and phase1 into the same set of files.

And then reality sets in and the logs of the system showed a message that caused an exception during parsing (normally only happens for the first kind of issue). An extension to the Request structure was changed in a not forward compatible way. Let’s have a look:

InitialDPArgExtension ::= SEQUENCE {

-naCarrierInformation [0] NACarrierInformation OPTIONAL,
-gmscAddress [1] ISDN-AddressString OPTIONAL,
-…
+ gmscAddress [0] ISDN-AddressString OPTIONAL,
*more new optional elements*
+ …,
+ enhancedDialledServicesAllowed [11] NULL OPTIONAL,
*more elements after the extension marker*
}

So one element (naCarrierInformation) got removed and then every following element was renumbered and the extension marker was moved further down. In theory the InitialDPArgExtension name binding exists once in the phase2 to definition and once in phase3 and 3GPP had all rights to define a new binding with different. An engineering question is if this was a good decision?

A change in application-context allows to remove some old cruft and make room for new. The tag space might be considered a scarce resource and making room is saving a resource. On the other hand in the history of GSM no other struct had ran out of tags and there are various other approaches to the problem. The above is already an extension to an extension and the step to an extension of an extension of an extension doesn’t seem so absurd anymore.

So please think of forward compatibility when designing protocols, think of the implementor and make the definition machine readable and please get the imports right so one doesn’t need to resort to a global symbol search. If you are having interesting core network issues related to TCAP, MAP and CAP consider contacting me.

MariaDB Galera and custom health probe for Azure LoadBalancer

MariaDB Galera and custom health probe for Azure LoadBalancer

My Galera set-up on Kubernetes and the Azure LoadBalancer in front of it seem to work nicely but one big TODO is to implement proper health checks. If a node is down, in maintenance or split from the network it should not be part of the LoadBalancer. The Azure LoadBalancer has support for custom HTTP probes and I wanted to write something very simple that handles the HTTP GET, opens a MySQL connection to the destination, check if it is connected to a primary. As this is about health checks the code should be small and reliable.

To improve my Go(-lang) skills I decided to write my healthcheck in Go. And it seemed like a good idea, Go has a powerful HTTP package, a SQL API package and two MySQL implementations. So the entire prototype is just about 72 lines (with comments and empty lines) and I think that qualifies as small. Prototyping the MySQL code took some iterations but in general it went quite quickly. But how reliable is it? Go introduced the nice concept of a context.Context. So any operation should be associated with a context and it should be passed as argument from one method to another. One can create a child context and associate it with a deadline (absolute time) or timeout (relative) and has a way to cancel it.

I grabbed the Context from the HTTP Request, added a timeout and called a function to do the MySQL check. Wow that was easy. Some polish to parse the parameters from the CLI and I am ready to deploy it! But let’s see how reliable it is?

I imagined the following error conditions:

  1. The destination IP is reachable but no one listening on the port. The TCP connection will fail quickly (SYN -> RST,ACK)
  2. The destination IP ends in a blackhole (no RST, ACK) received. One would have a large connect timeout
  3. The Galera node (or machine hosting it) is overloaded. While the connect succeeds the authentication or a query might stall
  4. The Galera node is split and not a master

The first and fourth error conditions are easy to test/simulate and trivial to implement properly. I then moved to the third one. My first choice was to implement an infinitely slow Galera node and did that by using nc -l 3006 to accept a TCP connection and then send nothing. I made a healthprobe and waited… and waited.. no timeout. Not after 2s as programmed in the context, not after 2min and not after.. (okay I gave up after 30 min). Pretty discouraging!

After some reading and browsing I saw an open PR to add context.Context support to the MySQL backend. I modified my import, ran go get to fetch it, go build and retested. Okay that didn’t work either. So let’s try the other MySQL implementation, again change the package imports, go get and go build and retest. I picked the wrong package name but even after picking the right package this driver failed to parse the Database URL. At that point I decided to go back to the first implementation and have a deeper look.

So while many of the SQL API methods take a Context as argument, the Open one does not. Open says it might or might not connect to the database and in case of MySQL it does connect to it. Let’s see if there is a workaround? I could spawn a Go routine and have a selective receive on the result or a timeout. While this would make it possible to respond to the HTTP request it does create two issues. First one can’t cancel Go routines and I would leak memory, but worse I might run into a connection limit of the Galera node. What about other workarounds? It seems I can play with a custom parameter for readTimeout and writeTimeout and at least limit the timeout per I/O operation. I guess it takes a bit of tuning to find good values for a busy system and let’s hope that context.Context will be used more in more places in the future.

Troubleshooting Kubernetes/Azure Storage

Troubleshooting Kubernetes/Azure Storage

In my previous posts I wrote about my set-up of MariaDB Galera on Kubernetes. Now I have some first experience with this set-up and can provide some guidance. I used an ill-fated TCP health-check that lead to MariaDB Galera blocking the originating IPv4 address from accessing the cluster due to never completing a MySQL handshake and it seems (logs are gone) that this lead to the sync between different systems breaking too.

When I woke up my entire cluster was down and didn’t recover. Some pods restarted and I run into a Azure Kubernetes bug where a Persistent Storage would be umounted but not detached. This means the storage can not be re-attached to the new pod. The Microsoft upstream project is a bit hostile but the issue is known. If you are seeing an error about the storage still being detached/attached. You can go to the portal, find the agent that has it attached and detach it by hand.

To bring the cluster back online there is a chicken/egg problem. The entrypoint.sh discovers the members of the cluster by using environment variables. If the cluster is entirely down and the first pod is starting, it will just exit as it can’t connect to the others. My first approach was to keep the other nodes down and use kubectl edit rc/galera-node-X and set replicas to 0. But then the service is still exporting the information. In the end I deleted the srv/galera-node-X and waited for the first pod to start. Once it was up I could re-create the services again.

My next steps are to add proper health checks, some monitoring and see if there is a more long term archive for the log data of a (deleted) pod.

 

Starting to use the Galera cluster

Starting to use the Galera cluster

In my previous post I wrote about getting a MariaDB Galera cluster  started on Kubernetes. One of my open issues was how to get my existing VM to connect to it. With Microsoft Azure the first thing is to add Network peering between the Kubernetes cluster and the normal VM network. As previously mentioned the internal IPv4 address of the Galera service is not reachable from outside and the three types of exposing a service are:

  • LoadBalancer
  • ClusterIP
  • NodePort

While the default Microsoft Azure setup already has two LoadBalancers, the kubectl expose –type=LoadBalancer command does not seem to allow me to chose which load balancer to use. So after trying this command my Galera cluster was reachable through a public IPv4 address on the standard MySQL port. While it is password protected it didn’t seem like a good idea. To change the config you can use something like kubectl edit srv/galera-cluster and change the type to another one. Then I tried the NodePort type and got the MySQL port exposed on all masters and thanks to the network peering was able to connect to them directly. Then I manually modified the already configured/created Microsoft Azure LoadBalancer for the three masters to export port 3306 and map it to the internal port. I am also doing a basic health check which checks if port 3306 can be connected to.

Now I can start using the Galera cluster from my container based deployment before migrating it fully to Kubernetes. My next step is probably to improve the health checks to only get primaries listed in the LoadBalancer and then add monitoring to it as well.