Archive

Category Archives for "MTU Ninja | Vincent Bernat"

A Makefile for your Go project (2019)

My most loathed feature of Go was the mandatory use of GOPATH: I do not want to put my own code next to its dependencies. I was not alone and people devised tools or crafted their own Makefile to avoid organizing their code around GOPATH.

Hopefully, since Go 1.11, it is possible to use Go’s modules to manage dependencies without relying on GOPATH. First, you need to convert your project to a module:1

$ go mod init hellogopher
go: creating new go.mod: module hellogopher
$ cat go.mod
module hellogopher

Then, you can invoke the usual commands, like go build or go test. The go command resolves imports by using versions listed in go.mod. When it runs into an import of a package not present in go.mod, it automatically looks up the module containing that package using the latest version and adds it.

$ go test ./...
go: finding github.com/spf13/cobra v0.0.5
go: downloading github.com/spf13/cobra v0.0.5
?       hellogopher     [no test files]
?       hellogopher/cmd [no test files]
ok      hellogopher/hello       0.001s
$ cat go.mod
module hellogopher

require github.com/spf13/cobra v0.0.5

If you want a specific version, you can Continue reading

Sustainable Python scripts

Python is a great language to write a standalone script. Getting to the result can be a matter of a dozen to a few hundred lines of code and, moments later, you can forget about it and focus on your next task.

Six months later, a co-worker asks you why the script fails and you don’t have a clue: no documentation, hard-coded parameters, nothing logged during the execution and no sensible tests to figure out what may go wrong.

Turning a “quick-and-dirty” Python script into a sustainable version, which will be easy to use, understand and support by your co-workers and your future self, only takes some moderate effort. As an illustration, let’s start from the following script solving the classic Fizz-Buzz test:

import sys
for n in range(int(sys.argv[1]), int(sys.argv[2])):
    if n % 3 == 0 and n % 5 == 0:
        print("fizzbuzz")
    elif n % 3 == 0:
        print("fizz")
    elif n % 5 == 0:
        print("buzz")
    else:
        print(n)

Documentation

I find useful to write documentation before coding: Continue reading

Pragmatic Debian packaging (2019)

Notice

This guide is an updated version of a previous edition. If you need to target distributions older than Debian Stretch and Ubuntu Bionic, please have a look at the older version instead.

While the creation of Debian packages is abundantly documented, most tutorials are targeted to packages implementing the Debian policy. Moreover, Debian packaging has a reputation of being unnecessarily difficult1 and many people prefer to use less constrained tools2 like fpm or CheckInstall.

However, building Debian packages with the official tools can become straightforward if you bend some rules:

  1. No source package will be generated. Packages will be built directly from a checkout of a VCS repository.

  2. Additional dependencies can be downloaded during build. Packaging individually each dependency is a painstaking work, notably when you have to deal with some fast-paced ecosystems like Java, Javascript and Go.

  3. The produced packages may bundle dependencies. This is likely to raise some concerns about security and long-term maintenance, but this is a common trade-off in many ecosystems, notably Java, Javascript and Go.

HiDPI on dual 4K monitors with Linux

I am using a Lenovo Thinkpad X1 Carbon laptop (210 DPI) since four years and a Nokia 8 phone (550 DPI) since a year. I enjoy their HiDPI screens: text is crisp and easy to read. To get a similar experience for my workstation, I bought a pair of Dell P2415Q monitors:

Two Dell P2415Q
Dual screen setup with two Dell P2415Q monitors

Monitors

The Dell P2415Q is a 24” display featuring an IPS panel with a 3840×2160 resolution (185 DPI) and a complete coverage of the sRGB color space. It was released in 2015 and its price is now below $400. It received positive reviews.

One of my units arrived with a dead pixel. I thought it was a problem from the past but Dell policy on dead pixels says:

During LCD manufacturing process, it is not uncommon for one or more sub-pixels to become fixed in an unchanging state. A display with a 1 to 5 fixed sub-pixel is considered normal and within industry standards.

Another issue is the presence of faint horizontal grey lines, (barely) visible on white background. The issue seems to not be uncommon but Dell is dismissive about it. If I sit correctly, the Continue reading

BGP LLGR: robust and reactive BGP sessions

On a BGP-routed network with multiple redundant paths, we seek to achieve two goals concerning reliability:

  1. A failure on a path should quickly bring down the related BGP sessions. A common expectation is to recover in less than a second by diverting the traffic to the remaining paths.

  2. As long as a path is operational, the related BGP sessions should stay up, even under duress.

Detecting failures fast: BFD⚓︎

To quickly detect a failure, BGP can be associated with BFD, a protocol to detect faults in bidirectional paths,1 defined in RFC 5880 and RFC 5882. BFD can use very low timers, like 100 ms.

However, when BFD runs in a process on top of a generic kernel,2 notably when running BGP on the host, it is not unexpected to loose a few BFD packets on adverse conditions: the daemon handling the BFD sessions may not get enough CPU to answer in a timely manner. In this scenario, it is not unlikely for all the BGP sessions to go down at the same time, creating an outage, as depicted in the last case in the diagram below.

BGP and failed sessions
Examples of failures on a network using BGP Continue reading

Multi-tier load-balancing with Linux

A common solution to provide a highly-available and scalable service is to insert a load-balancing layer to spread requests from users to backend servers.1 We usually have several expectations for such a layer:

scalability
It allows a service to scale by pushing traffic to newly provisioned backend servers. It should also be able to scale itself when it becomes the bottleneck.
availability
It provides high availability to the service. If one server becomes unavailable, the traffic should be quickly steered to another server. The load-balancing layer itself should also be highly available.
flexibility
It handles both short and long connections. It is flexible enough to offer all the features backends generally expect from a load-balancer like TLS or HTTP routing.
operability
With some cooperation, any expected change should be seamless: rolling out a new software on the backends, adding or removing backends, or scaling up or down the load-balancing layer itself.

The problem and its solutions are well known. From recently published articles on the topic, “Introduction to modern network load-balancing and proxying” provides an overview of the state of the art. Google released “Maglev: A Fast and Reliable Software Network Load Balancer” describing their Continue reading

A more privacy-friendy blog

When I started this blog, I embraced some free services, like Disqus or Google Analytics. These services are quite invasive for users’ privacy. Over the years, I have tried to correct this to reach a point where I do not rely on any “privacy-hostile” services.

Analytics?

Google Analytics is an ubiquitous solution to get a powerful analytics solution for free. It’s also a great way to provide data about your visitors to Google—also for free. There are self-hosted solutions like Matomo—previously Piwik.

I opted for a simpler solution: no analytics. It also enables me to think that my blog attracts thousands of visitors every day.

Fonts?

Google Fonts is a very popular font library and hosting service, which relies on the generic Google Privacy Policy. The google-webfonts-helper service makes it easy to self-host any font from Google Fonts. Moreover, with help from pyftsubset, I include only the characters used in this blog. The font files are lighter and more complete: no problem spelling “Antonín Dvořák”.

Videos?

OPL2 Audio Board: an AdLib sound card for Arduino

In a previous article, I presented the OPL2LPT, a sound card for the parallel port featuring a Yamaha YM3812 chip, also known as OPL2—the chip of the AdLib sound card. The OPL2 Audio Board for Arduino is another indie sound card using this chip. However, instead of relying on a parallel port, it uses a serial interface, which can be drived from an Arduino board or a Raspberry Pi. While the OPL2LPT targets retrogamers with real hardware, the OPL2 Audio Board cannot be used in the same way. Nonetheless, it can also be operated from ScummVM and DOSBox!

OPL2 Audio Board for Arduino
The OPL2 Audio Board over a “Grim Fandango” box.

Unboxing?

The OPL2 Audio Board can be purchased on Tindie, either as a kit or fully assembled. I have paired it with a cheap clone of the Arduino Nano. A library to drive the board is available on GitHub, along with some examples.

One of them is DemoTune.ino. It plays a short tune on three channels. It can be compiled and uploaded to the Arduino with PlatformIO—installable with pip install platformio—using the following command:1

$ platformio ci \
    --board nanoatmega328 \
    --lib ../.. Continue reading

Self-hosted videos with HLS

Note

This article was first published on Exoscale blog with some minor modifications.

Hosting videos on YouTube is convenient for several reasons: pretty good player, free bandwidth, mobile-friendly, network effect and, at your discretion, no ads.1 On the other hand, this is one of the less privacy-friendly solution. Most other providers share the same characteristics—except the ability to disable ads for free.

With the <video> tag, self-hosting a video is simple:2

<video controls>
  <source src="../videos/big_buck_bunny.webm" type="video/webm">
  <source src="../videos/big_buck_bunny.mp4" type="video/mp4">
</video>

However, while it is possible to provide a different videos depending on the screen width, adapting the video to the available bandwidth is trickier. There are two solutions:

They are both adaptive bitrate streaming protocols: the video is sliced in small segments and made available at a variety of different bitrates. Depending on current network conditions, the player automatically selects the appropriate bitrate to download the next segment.

HLS was initially implemented by Apple but is now also supported Continue reading

Integration of a Go service with systemd: socket activation

In a previous post, I highlighted some useful features of systemd when writing a service in Go, notably to signal readiness and prove liveness. Another interesting bit is socket activation: systemd listens on behalf of the application and, on incoming traffic, starts the service with a copy of the listening socket. Lennart Poettering details in a blog post:

If a service dies, its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off. If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade.

This is one solution to get zero-downtime deployment for your application. Another upside is you can run your daemon with less privileges—loosing rights is a difficult task in Go.1

The basics?

Let’s take back our nifty 404-only web server:

package main

import (
    "log"
    "net"
    "net/http"
 Continue reading

Route-based VPN on Linux with WireGuard

In a previous article, I described an implementation of redundant site-to-site VPNs using IPsec (with strongSwan as an IKE daemon) and BGP (with BIRD) to achieve this: ?

Redundant VPNs between 3 sites

The two strengths of such a setup are:

  1. Routing daemons distribute routes to be protected by the VPNs. They provide high availability and decrease the administrative burden when many subnets are present on each side.
  2. Encapsulation and decapsulation are executed in a different network namespace. This enables a clean separation between a private routing instance (where VPN users are) and a public routing instance (where VPN endpoints are).

As an alternative to IPsec, WireGuard is an extremely simple (less than 5,000 lines of code) yet fast and modern VPN that utilizes state-of-the-art and opinionated cryptography (Curve25519, ChaCha20, Poly1305) and whose protocol, based on Noise, has been formally verified. It is currently available as an out-of-tree module for Linux but is likely to be merged when the protocol is not subject to change anymore. Compared to IPsec, its major weakness is its lack of interoperability.

It can easily replace strongSwan in our site-to-site setup. On Linux, it already acts as a route-based VPN. As a first Continue reading

Packaging an out-of-tree module for Debian with DKMS

DKMS is a framework designed to allow individual kernel modules to be upgraded without changing the whole kernel. It is also very easy to rebuild modules as you upgrade kernels.

On Debian-like systems,1 DKMS enables the installation of various drivers, from ZFS on Linux to VirtualBox kernel modules or NVIDIA drivers. These out-of-tree modules are not distributed as binaries: once installed, they need to be compiled for your current kernel. Everything is done automatically:

# apt install zfs-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  binutils cpp cpp-6 dkms fakeroot gcc gcc-6 gcc-6-base libasan3 libatomic1 libc-dev-bin libc6-dev
  libcc1-0 libcilkrts5 libfakeroot libgcc-6-dev libgcc1 libgomp1 libisl15 libitm1 liblsan0 libmpc3
  libmpfr4 libmpx2 libnvpair1linux libquadmath0 libstdc++6 libtsan0 libubsan0 libuutil1linux libzfs2linux
  libzpool2linux linux-compiler-gcc-6-x86 linux-headers-4.9.0-6-amd64 linux-headers-4.9.0-6-common
  linux-headers-amd64 linux-kbuild-4.9 linux-libc-dev make manpages manpages-dev patch spl spl-dkms
  zfs-zed zfsutils-linux
[…]
3 upgraded, 44 newly installed, 0 to remove and 3 not upgraded.
Need to get 42.1 MB of archives.
After this operation, 187 MB of additional disk space will be used.
Do you want to continue? [Y/n]
[…]
# dkms status
spl, 0.6.5.9, 4.9.0-6-amd64, x86_64:  Continue reading

OPL2LPT: an AdLib sound card for the parallel port

The AdLib sound card was the first popular sound card for IBM PC—prior to that, we were pampered by the sound of the PC speaker. Connected to an 8-bit ISA slot, it is powered by a Yamaha YM3812 chip, also known as OPL2. This chip can drive 9 sound channels whose characteristics can be fine tuned through 244 write-only registers.

AdLib sound card

I had one but I am unable to locate it anymore. Models on eBay are quite rare and expensive. It is possible to build one yourself (either Sergey’s one or this faithful reproduction). However, you still need an ISA port. The limitless imagination of some hackers can still help here. For example, you can combine Sergey’s Xi 8088 processor board, with his ISA 8-bit backplane, his Super VGA card and his XT-CF-Lite card to get your very own modernish IBM PC compatible. Alternatively, you can look at the AdLib sound card on a parallel port from Raphaël Assénat.

The OPL2LPT sound card?

Recently, the 8-Bit Guy released a video about an AdLib sound card for the parallel port, the OPL2LPT. While current motherboards don’t have a parallel port anymore, it’s easy to Continue reading

L3 routing to the hypervisor with BGP

On layer 2 networks, high availability can be achieved by:

Layer 2 networks need very little configuration but come with a major drawback in highly available scenarios: an incident is likely to bring the whole network down.2 Therefore, it is safer to limit the scope of a single layer 2 network by, for example, using one distinct network in each rack and connecting them together with layer 3 routing. Incidents are unlikely to impact a whole IP network.

In the illustration below, top of the rack switches provide a default gateway for hosts. To provide redundancy, they use an MC-LAG implementation. Layer 2 fault domains are scoped to a rack. Each IP subnet is bound to a specific rack and routing information is shared between top of the rack switches and core routers using a routing protocol like OSPF.

Legacy L2 design

There are two main issues with this design:

  1. The Continue reading

(Micro)benchmarking Linux kernel functions

Usually, the performance of a Linux subsystem is measured through an external (local or remote) process stressing it. Depending on the input point used, a large portion of code may be involved. To benchmark a single function, one solution is to write a kernel module.

Minimal kernel module

Let’s suppose we want to benchmark the IPv4 route lookup function, fib_lookup(). The following kernel function executes 1,000 lookups for 8.8.8.8 and returns the average value.1 It uses the get_cycles() function to compute the execution “time.”

/* Execute a benchmark on fib_lookup() and put
   result into the provided buffer `buf`. */
static int do_bench(char *buf)
{
    unsigned long long t1, t2;
    unsigned long long total = 0;
    unsigned long i;
    unsigned count = 1000;
    int err = 0;
    struct fib_result res;
    struct flowi4 fl4;

    memset(&fl4, 0, sizeof(fl4));
    fl4.daddr = in_aton("8.8.8.8");

    for (i = 0; i < count; i++) {
        t1 = get_cycles();
        err |=  Continue reading

Route-based IPsec VPN on Linux with strongSwan

A common way to establish an IPsec tunnel on Linux is to use an IKE daemon, like the one from the strongSwan project, with a minimal configuration1:

conn V2-1
  left        = 2001:db8:1::1
  leftsubnet  = 2001:db8:a1::/64
  right       = 2001:db8:2::1
  rightsubnet = 2001:db8:a2::/64
  authby      = psk
  auto        = start

The same configuration can be used on both sides. Each side will figure out if it is “left” or “right”. The IPsec site-to-site tunnel endpoints are 2001:db8:­1::1 and 2001:db8:­2::1. The protected subnets are 2001:db8:­a1::/64 and 2001:db8:­a2::/64. As a result, strongSwan configures the following policies in the kernel:

$ ip xfrm policy
src 2001:db8:a1::/64 dst 2001:db8:a2::/64
        dir out priority 399999 ptype main
        tmpl src 2001:db8:1::1 dst 2001:db8:2::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir fwd priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir in priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
[…]

This kind of IPsec tunnel is a policy-based VPN: encapsulation and decapsulation are governed by these policies. Each of them contains the following elements:

Performance progression of IPv6 route lookup on Linux

In a previous article, I explained how Linux implements an IPv6 routing table. The following graph shows the performance progression of route lookups through Linux history:

IPv6 route lookup performance progression

All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels as well as current ones. The kernel configuration is the default one with CONFIG_SMP, CONFIG_IPV6, CONFIG_IPV6_MULTIPLE_TABLES and CONFIG_IPV6_SUBTREES options enabled. Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark.

There are three notable performance changes:

  • In Linux 3.1, Eric Dumazet delays a bit the copy of route metrics to fix the undesirable sharing of route-specific metrics by all cache entries (commit 21efcfa0ff27). Each cache entry now gets its own metrics, which explains the performance hit for the non-/128 scenarios.
  • In Linux 3.9, Yoshifuji Hideaki removes the reference to the neighbor entry in struct rt6_info (commit 887c95cc1da5). This should have lead to a performance increase. The small regression may be due to cache-related issues.
  • In Linux 4.2, Martin KaFai Lau prevents the creation of cache entries for most route lookups. The Continue reading

IPv6 route lookup on Linux

TL;DR: With its implementation of IPv6 routing tables using radix trees, Linux offers subpar performance (450 ns for a full view — 40,000 routes) compared to IPv4 (50 ns for a full view — 500,000 routes) but fair memory usage (20 MiB for a full view).


In a previous article, we had a look at IPv4 route lookup on Linux. Let’s see how different IPv6 is.

Lookup trie implementation

Looking up a prefix in a routing table comes down to find the most specific entry matching the requested destination. A common structure for this task is the trie, a tree structure where each node has its parent as prefix.

With IPv4, Linux uses a level-compressed trie (or LPC-trie), providing good performances with low memory usage. For IPv6, Linux uses a more classic radix tree (or Patricia trie). There are three reasons for not sharing:

  • The IPv6 implementation (introduced in Linux 2.1.8, 1996) predates the IPv4 implementation based on LPC-tries (in Linux 2.6.13, commit 19baf839ff4a).
  • The feature set is different. Notably, IPv6 supports source-specific routing1 (since Linux 2. Continue reading

Performance progression of IPv4 route lookup on Linux

TL;DR: Each of Linux 2.6.39, 3.6 and 4.0 brings notable performance improvements for the IPv4 route lookup process.


In a previous article, I explained how Linux implements an IPv4 routing table with compressed tries to offer excellent lookup times. The following graph shows the performance progression of Linux through history:

IPv4 route lookup performance

Two scenarios are tested:

  • 500,000 routes extracted from an Internet router (half of them are /24), and
  • 500,000 host routes (/32) tightly packed in 4 distinct subnets.

All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels1 as well as current ones. The kernel configuration used is the default one with CONFIG_SMP and CONFIG_IP_MULTIPLE_TABLES options enabled (however, no IP rules are used). Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark.

The measurements are done in a virtual machine with one vCPU2. The host is an Intel Core i5-4670K and the CPU governor was set to “performance”. The benchmark is single-threaded. Implemented as a kernel module, it calls fib_lookup() with various destinations in 100,000 timed iterations and keeps the Continue reading

IPv4 route lookup on Linux

TL;DR: With its implementation of IPv4 routing tables using LPC-tries, Linux offers good lookup performance (50 ns for a full view) and low memory usage (64 MiB for a full view).


During the lifetime of an IPv4 datagram inside the Linux kernel, one important step is the route lookup for the destination address through the fib_lookup() function. From essential information about the datagram (source and destination IP addresses, interfaces, firewall mark, …), this function should quickly provide a decision. Some possible options are:

  • local delivery (RTN_LOCAL),
  • forwarding to a supplied next hop (RTN_UNICAST),
  • silent discard (RTN_BLACKHOLE).

Since 2.6.39, Linux stores routes into a compressed prefix tree (commit 3630b7c050d9). In the past, a route cache was maintained but it has been removed1 in Linux 3.6.

Route lookup in a trie

Looking up a route in a routing table is to find the most specific prefix matching the requested destination. Let’s assume the following routing table: