Perfect locality and three epic SystemTap scripts
In a recent blog post we discussed epoll behavior causing uneven load among NGINX worker processes. We suggested a work around - the REUSEPORT socket option. It changes the queuing from "combined queue model" aka Waitrose (formally: M/M/s), to a dedicated accept queue per worker aka "the Tesco superstore model" (formally: M/M/1). With this setup the load is spread more evenly, but in certain conditions the latency distribution might suffer.
After reading that piece, a colleague of mine, John, said: "Hey Marek, don't forget that REUSEPORT has an additional advantage: it can improve packet locality! Packets can avoid being passed around CPUs!"
John had a point. Let's dig into this step by step.
In this blog post we'll explain the REUSEPORT socket option, how it can help with packet locality and its performance implications. We'll show three advanced SystemTap scripts which we used to help us understand and measure the packet locality.
A shared queue
The standard BSD socket API model is rather simple. In order to receive new TCP connections a program calls bind() and then listen() on a fresh socket. This will create a single accept queue. Programs can share the file descriptor - pointing Continue reading


It follows reports that Broadcom is planning an unsolicited bid for chipmaker Qualcomm.
This follows yesterday's news that Broadcom is relocating its headquarters to the U.S.
NetApp teamed up with Cisco on converged infrastructure seven years ago.
Expect more consolidation of SD-WAN and networking vendors to come.
Black Duck’s security technology automates the detection of vulnerabilities in open source software.