Author Archives: Vincent Bernat
Author Archives: Vincent Bernat
In a previous article, I explained how Linux implements an IPv6 routing table. The following graph shows the performance progression of route lookups through Linux history:
All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is
able to compile older kernels as well as current ones. The kernel configuration
is the default one with CONFIG_SMP
, CONFIG_IPV6
,
CONFIG_IPV6_MULTIPLE_TABLES
and CONFIG_IPV6_SUBTREES
options enabled. Some
other unrelated options are enabled to be able to boot them in a virtual machine
and run the benchmark.
There are three notable performance changes:
struct rt6_info
(commit 887c95cc1da5). This should have lead to
a performance increase. The small regression may be due to cache-related
issues.TL;DR: With its implementation of IPv6 routing tables using radix trees, Linux offers subpar performance (450 ns for a full view — 40,000 routes) compared to IPv4 (50 ns for a full view — 500,000 routes) but fair memory usage (20 MiB for a full view).
In a previous article, we had a look at IPv4 route lookup on Linux. Let’s see how different IPv6 is.
Looking up a prefix in a routing table comes down to find the most specific entry matching the requested destination. A common structure for this task is the trie, a tree structure where each node has its parent as prefix.
With IPv4, Linux uses a level-compressed trie (or LPC-trie), providing good performances with low memory usage. For IPv6, Linux uses a more classic radix tree (or Patricia trie). There are three reasons for not sharing:
TL;DR: Each of Linux 2.6.39, 3.6 and 4.0 brings notable performance improvements for the IPv4 route lookup process.
In a previous article, I explained how Linux implements an IPv4 routing table with compressed tries to offer excellent lookup times. The following graph shows the performance progression of Linux through history:
Two scenarios are tested:
All kernels are compiled with GCC 4.9 (from Debian Jessie). This
version is able to compile older kernels1 as well as current
ones. The kernel configuration used is the default one with
CONFIG_SMP
and CONFIG_IP_MULTIPLE_TABLES
options enabled (however,
no IP rules are used). Some other unrelated options are enabled to be
able to boot them in a virtual machine and run the benchmark.
The measurements are done in a virtual machine with one
vCPU2. The host is an Intel Core i5-4670K and the CPU
governor was set to “performance”. The benchmark is
single-threaded. Implemented as a kernel module, it calls
fib_lookup()
with various destinations in 100,000 timed iterations
and keeps the Continue reading
TL;DR: With its implementation of IPv4 routing tables using LPC-tries, Linux offers good lookup performance (50 ns for a full view) and low memory usage (64 MiB for a full view).
During the lifetime of an IPv4 datagram inside the Linux kernel, one
important step is the route lookup for the destination address
through the fib_lookup()
function. From essential
information about the datagram (source and destination IP addresses,
interfaces, firewall mark, …), this function should quickly provide
a decision. Some possible options are:
RTN_LOCAL
),RTN_UNICAST
),RTN_BLACKHOLE
).Since 2.6.39, Linux stores routes into a compressed prefix tree (commit 3630b7c050d9). In the past, a route cache was maintained but it has been removed1 in Linux 3.6.
Looking up a route in a routing table is to find the most specific prefix matching the requested destination. Let’s assume the following routing table:
VXLAN is an overlay network to encapsulate Ethernet traffic over an existing (highly available and scalable, possibly the Internet) IP network while accomodating a very large number of tenants. It is defined in RFC 7348. For an uncut introduction on its use with Linux, have a look at my “VXLAN & Linux” post.
In the above example, we have hypervisors hosting a virtual machines from different tenants. Each virtual machine is given access to a tenant-specific virtual Ethernet segment. Users are expecting classic Ethernet segments: no MAC restrictions1, total control over the IP addressing scheme they use and availability of multicast.
In a large VXLAN deployment, two aspects need attention:
A typical solution for the first point is using multicast. For the second point, this is source-address learning.
BGP EVPN (RFC 7432 and draft-ietf-bess-evpn-overlay for its application with VXLAN Continue reading
VXLAN is an overlay network to carry Ethernet traffic over an existing (highly available and scalable) IP network while accommodating a very large number of tenants. It is defined in RFC 7348.
Starting from Linux 3.12, the VXLAN implementation is quite complete as both multicast and unicast are supported as well as IPv6 and IPv4. Let’s explore the various methods to configure it.
To illustrate our examples, we use the following setup:
A VXLAN tunnel extends the individual Ethernet segments accross the
three bridges, providing a unique (virtual) Ethernet segment. From one
host (e.g. H1
), we can reach directly all the other hosts in the
virtual segment:
$ ping -c10 -w1 -t1 ff02::1%eth0 PING ff02::1%eth0(ff02::1%eth0) 56 data bytes 64 bytes from fe80::5254:33ff:fe00:8%eth0: icmp_seq=1 ttl=64 time=0.016 ms 64 bytes from fe80::5254:33ff:fe00:b%eth0: icmp_seq=1 ttl=64 time=4.98 ms (DUP!) 64 bytes from fe80::5254:33ff:fe00:9%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!) 64 bytes from fe80::5254:33ff:fe00:a%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!) --- ff02::1%eth0 ping statistics --- 1 packets transmitted, 1 received, +3 duplicates, Continue reading
TL;DR: when configuring a Linux bridge, use the following commands to enforce isolation:
# bridge vlan del dev br0 vid 1 self # echo 1 > /sys/class/net/br0/bridge/vlan_filtering
A network bridge (also commonly called a “switch”) brings several Ethernet segments together. It is a common element in most infrastructures. Linux provides its own implementation.
A typical use of a Linux bridge is shown below. The hypervisor is
running three virtual hosts. Each virtual host is attached to the
br0
bridge (represented by the horizontal segment). The hypervisor
has two physical network interfaces:
eth0
is attached to a public network providing various services
for the virtual hosts (DHCP, DNS, NTP, routers to Internet, …). It is
also part of the br0
bridge.eth1
is attached to an infrastructure network providing
various services to the hypervisor (DNS, NTP, configuration
management, routers to Internet, …). It is not part of the
br0
bridge.The main expectation of such a setup is that while the virtual hosts should be able to use resources from the public network, they should not be able to access resources from the infrastructure network (including resources hosted on the hypervisor itself, like a Continue reading
Org mode is a package for Emacs to “keep notes, maintain todo lists, planning projects and authoring documents”. It can execute embedded snippets of code and capture the output (through Babel). It’s an invaluable tool for documenting your infrastructure and your operations.
Here are three (relatively) short videos exhibiting Org mode use in the context of network operations. In all of them, I am using my own junos-mode which features the following perks:
Since some Junos devices can be quite slow, commits and remote executions are done asynchronously1 with the help of a Python helper.
In the first video, I take some notes about configuring BGP add-path feature (RFC 7911). It demonstrates all the available features of junos-mode.
In the second video, I execute a planned operation to enable this feature in production. The document is a modus operandi and contains the configuration to apply and the commands to check if it works as expected. At the end, the document becomes a detailed report of the operation.
In the third video, a cookbook has been prepared to execute Continue reading
Unlike other programming languages, Go’s runtime doesn’t provide a way to reliably daemonize a service. A system daemon has to supply this functionality. Most distributions ship systemd which would fit the bill. A correct integration with systemd is quite straightforward. There are two interesting aspects: readiness & liveness.
As an example, we will daemonize this service whose goal is to answer requests with nifty 404 errors:
package main import ( "log" "net" "net/http" ) func main() { l, err := net.Listen("tcp", ":8081") if err != nil { log.Panicf("cannot listen: %s", err) } http.Serve(l, nil) }
You can build it with go build 404.go
.
Here is the service file, 404.service
1:
[Unit] Description=404 micro-service [Service] Type=notify ExecStart=/usr/bin/404 WatchdogSec=30s Restart=on-failure [Install] WantedBy=multi-user.target
The classic way for an Unix daemon to signal its readiness is to daemonize. Technically, this is done by calling fork(2) twice (which also serves other intents). This is a very common task and the BSD systems, as well as some other C libraries, supply a daemon(3) Continue reading
I was an happy user of rxvt-unicode until I got a laptop with an HiDPI display. Switching from a LoDPI to a HiDPI screen and back was a pain: I had to manually adjust the font size on all terminals or restart them.
VTE is a library to build a terminal emulator using the GTK+ toolkit, which handles DPI changes. It is used by many terminal emulators, like GNOME Terminal, evilvte, sakura, termit and ROXTerm. The library is quite straightforward and writing a terminal doesn’t take much time if you don’t need many features.
Let’s see how to write a simple one.
Let’s start small with a terminal with the default settings. We’ll write that in C. Another supported option is Vala.
#include <vte/vte.h> int main(int argc, char *argv[]) { GtkWidget *window, *terminal; /* Initialise GTK, the window and the terminal */ gtk_init(&argc, &argv); terminal = vte_terminal_new(); window = gtk_window_new(GTK_WINDOW_TOPLEVEL); gtk_window_set_title(GTK_WINDOW(window), "myterm"); /* Start a new shell */ gchar **envp = g_get_environ(); gchar **command = (gchar * Continue reading