When TCP sockets refuse to die

While working on our Spectrum server, we noticed something weird: the TCP sockets which we thought should have been closed were lingering around. We realized we don't really understand when TCP sockets are supposed to time out!

In our code, we wanted to make sure we don't hold connections to dead hosts. In our early code we naively thought enabling TCP keepalives would be enough... but it isn't. It turns out a fairly modern TCP_USER_TIMEOUT socket option is equally as important. Furthermore it interacts with TCP keepalives in subtle ways. Many people are confused by this.
In this blog post, we'll try to show how these options work. We'll show how a TCP socket can timeout during various stages of its lifetime, and how TCP keepalives and user timeout influence that. To better illustrate the internals of TCP connections, we'll mix the outputs of the tcpdump and the ss -o commands. This nicely shows the transmitted packets and the changing parameters of the TCP connections.
SYN-SENT
Let's start from the simplest case - what happens when one attempts to establish a connection to a server which discards inbound SYN packets?
$ Continue reading




