Keepalives considered harmful

This may sound like a weird title, but hear me out. You’d think keepalives would always be helpful, but turns out reality isn’t always what you expect it to be. It really helps if you read Why does one NGINX worker take all the load? first. This post is an adaptation of a rather old post on Cloudflare’s internal blog, so not all details are exactly as they are in production today but the lessons are still valid.
This is a story about how we were seeing some complaints about sporadic latency spikes, made some unconventional changes, and were able to slash the 99.9th latency percentile by 4x!
Request flow on Cloudflare edge
I'm going to focus only on two parts of our edge stack: FL and SSL.
- FL accepts plain HTTP connections and does the main request logic, including our WAF
- SSL terminates SSL and passes connections to FL over local Unix socket:
Here’s a diagram:

These days we route all traffic through SSL for simplicity, but in the grand scheme of things it’s not going to matter much.
Each of these processes is not itself a single process, but rather a master process and a collection of Continue reading
This is an urgent call for expert help to quickly test a possible method to sterilize used N95 masks.