Unbounded memory usage by TCP for receive buffers, and how we fixed it
data:image/s3,"s3://crabby-images/115b5/115b5690f42e6317dbe7d50e56e1d40d620636ba" alt="Unbounded memory usage by TCP for receive buffers, and how we fixed it"
data:image/s3,"s3://crabby-images/08760/0876036d8c3f70349a4ae5cf3ca8c02c1358e8da" alt="Unbounded memory usage by TCP for receive buffers, and how we fixed it"
At Cloudflare, we are constantly monitoring and optimizing the performance and resource utilization of our systems. Recently, we noticed that some of our TCP sessions were allocating more memory than expected.
The Linux kernel allows TCP sessions that match certain characteristics to ignore memory allocation limits set by autotuning and allocate excessive amounts of memory, all the way up to net.ipv4.tcp_rmem max (the per-session limit). On Cloudflare’s production network, there are often many such TCP sessions on a server, causing the total amount of allocated TCP memory to reach net.ipv4.tcp_mem thresholds (the server-wide limit). When that happens, the kernel imposes memory use constraints on all TCP sessions, not just the ones causing the problem. Those constraints have a negative impact on throughput and latency for the user. Internally within the kernel, the problematic sessions trigger TCP collapse processing, “OFO” pruning (dropping of packets already received and sitting in the out-of-order queue), and the dropping of newly arriving packets.
This blog post describes in detail the root cause of the problem and shows the test results of a solution.
TCP receive buffers are excessively big for some sessions
Our journey began when we started noticing a lot Continue reading