Static binaries (for Go with Docker)

Static binaries (for Go with Docker)

These days Go is quite popular for server based systems (read “cloud”) and one of the nice attributes is that compiling an application results in a single binary with no external dependencies (there is no “runtime” it has to link to). This makes deploying (read “copy to machine”) super easy and is a big contrast to something like Ruby on Rails and its thousands of dependencies. IIRC this feature was attractive to the developers of Qt’s coin (continuous integration agent) as well.

Amusingly in contrast to Rust, Swift or other modern languages the compiler/assembler/linker isn’t powered by LLVM but is based on the Plan9 C compiler which was converted to Go. By setting the GOOS and GOARCH environment variables one can easily cross-compile the binary. E.g. on MacOs build a binary that runs on Linux/ARM64.

When using other system libraries (through cgo) this single binary needs to link to other libraries but this complicates the deployment. The right version and ABI of the library need to be present, if not the application might not start or behaves weirdly. I was in this situation for my tcapflow monitoring utility. I would like to be able to deploy it on any version of RHEL, Ubuntu, Debian without anyone having to install the right libraries.

Here is where musl, Alpine and Docker came to rescue me. Let me briefly elaborate. The dominant C library on GNU/Linux is GNU Libc (glibc) doesn’t support static linking for some good (security, PIE) and some IMHO lazy reasons (PIE could still work, iconv/nss). On the other hand the musl library does support static linking and the C library is quite compatible to glibc and Alpine Linux is a Linux distribution that is using musl instead of glibc. By making use of Alpine I avoid having to build musl and then compiling libpcap and other libraries myself. The final item is Docker. It solves fetching a runnable set of binaries/libraries and setting-up/running a chroot for me. The command line below should result in the alpine container being fetched and an interactive shell prompt coming up. During development I use it to quickly fetch/try the latest version of postgres, mysql, etc.

docker run -it alpine:3.6 /bin/sh

I ended up creating a simple build script that will use the Alpine package manager to install the needed dependencies and then make a static build. The magic for the static build is to pass ldflags to go build which looks like:

go build --ldflags '-linkmode external -extldflags "-static"'

Instead of using a Dockerfile to build a container/image that I will never use (but would still consume disk space) I trigger my compilation through two commands. One to build for i386 and the other for AMD64.

docker run --rm=true -itv $PWD:/mnt alpine:3.6 /mnt/build_static.sh
docker run --rm=true -itv $PWD:/mnt i386/alpine:3.6 /mnt/build_static.sh

In the end I will have two binaries in the out/ directory of my sourcecode. I am using the Binutils objdump to look at the ELF headers of the binary to check which libraries it wants to link to. Shared library dependencies are indicated with NEEDED but in this case there is no such line which means the libpcap dependency was statically linked. For me musl+alpine+docker is the easiest way to build static binaries.

$ objdump -x out/tcapflow-client
out/tcapflow-client:     file format elf32-i386
out/tcapflow-client
architecture: i386, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x000c55b9

Program Header:
LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12
filesz 0x004ecf5c memsz 0x004ecf5c flags r-x
LOAD off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**12
filesz 0x0032ea17 memsz 0x0075df34 flags rw-
DYNAMIC off 0x007e2f1c vaddr 0x007e3f1c paddr 0x007e3f1c align 2**2
filesz 0x000000a8 memsz 0x000000a8 flags rw-
NOTE off 0x00000120 vaddr 0x00000120 paddr 0x00000120 align 2**5
filesz 0x00000038 memsz 0x00000038 flags r--
TLS off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**2
filesz 0x00000000 memsz 0x00000004 flags r--
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
filesz 0x00000000 memsz 0x00000000 flags rw-
RELRO off 0x004edc8c vaddr 0x004eec8c paddr 0x004eec8c align 2**0
filesz 0x002f5374 memsz 0x002f5374 flags r--

Dynamic Section:
SYMBOLIC 0x00000000
INIT 0x000c54ac
FINI 0x0046eed5
GNU_HASH 0x00000158
STRTAB 0x000001d8
SYMTAB 0x00000188
STRSZ 0x00000021
SYMENT 0x00000010
DEBUG 0x00000000
PLTGOT 0x007e3fc4
REL 0x000001fc
RELSZ 0x000c52b0
RELENT 0x00000008
BIND_NOW 0x00000000
FLAGS_1 0x08000001
RELCOUNT 0x00018a56
MariaDB Galera and custom health probe for Azure LoadBalancer

MariaDB Galera and custom health probe for Azure LoadBalancer

My Galera set-up on Kubernetes and the Azure LoadBalancer in front of it seem to work nicely but one big TODO is to implement proper health checks. If a node is down, in maintenance or split from the network it should not be part of the LoadBalancer. The Azure LoadBalancer has support for custom HTTP probes and I wanted to write something very simple that handles the HTTP GET, opens a MySQL connection to the destination, check if it is connected to a primary. As this is about health checks the code should be small and reliable.

To improve my Go(-lang) skills I decided to write my healthcheck in Go. And it seemed like a good idea, Go has a powerful HTTP package, a SQL API package and two MySQL implementations. So the entire prototype is just about 72 lines (with comments and empty lines) and I think that qualifies as small. Prototyping the MySQL code took some iterations but in general it went quite quickly. But how reliable is it? Go introduced the nice concept of a context.Context. So any operation should be associated with a context and it should be passed as argument from one method to another. One can create a child context and associate it with a deadline (absolute time) or timeout (relative) and has a way to cancel it.

I grabbed the Context from the HTTP Request, added a timeout and called a function to do the MySQL check. Wow that was easy. Some polish to parse the parameters from the CLI and I am ready to deploy it! But let’s see how reliable it is?

I imagined the following error conditions:

  1. The destination IP is reachable but no one listening on the port. The TCP connection will fail quickly (SYN -> RST,ACK)
  2. The destination IP ends in a blackhole (no RST, ACK) received. One would have a large connect timeout
  3. The Galera node (or machine hosting it) is overloaded. While the connect succeeds the authentication or a query might stall
  4. The Galera node is split and not a master

The first and fourth error conditions are easy to test/simulate and trivial to implement properly. I then moved to the third one. My first choice was to implement an infinitely slow Galera node and did that by using nc -l 3006 to accept a TCP connection and then send nothing. I made a healthprobe and waited… and waited.. no timeout. Not after 2s as programmed in the context, not after 2min and not after.. (okay I gave up after 30 min). Pretty discouraging!

After some reading and browsing I saw an open PR to add context.Context support to the MySQL backend. I modified my import, ran go get to fetch it, go build and retested. Okay that didn’t work either. So let’s try the other MySQL implementation, again change the package imports, go get and go build and retest. I picked the wrong package name but even after picking the right package this driver failed to parse the Database URL. At that point I decided to go back to the first implementation and have a deeper look.

So while many of the SQL API methods take a Context as argument, the Open one does not. Open says it might or might not connect to the database and in case of MySQL it does connect to it. Let’s see if there is a workaround? I could spawn a Go routine and have a selective receive on the result or a timeout. While this would make it possible to respond to the HTTP request it does create two issues. First one can’t cancel Go routines and I would leak memory, but worse I might run into a connection limit of the Galera node. What about other workarounds? It seems I can play with a custom parameter for readTimeout and writeTimeout and at least limit the timeout per I/O operation. I guess it takes a bit of tuning to find good values for a busy system and let’s hope that context.Context will be used more in more places in the future.