High Availability Load Balancers with Maglev

Background

We run many backend services that power our customer dashboard, APIs, and features available at our edge. We own and operate physical infrastructure for our backend services. We need an effective way to route arbitrary TCP and UDP traffic between services and also from outside these data centers.
Previously, all traffic for these backend services would pass through several layers of stateful TCP proxies and NATs before reaching an available instance. This solution worked for several years, but as we grew it caused our service and operations teams many issues. Our service teams needed to deal with drops of availability, and our operations teams had much toil when needing to do maintenance on load balancer servers.
Goals
With the experience with our stateful TCP proxy and NAT solutions in mind, we had several goals for a replacement load balancing service, while remaining on our own infrastructure:
- Preserve source IPs through routing decisions to destination servers. This allows us to support servers that require client IP addresses as part of their operation, without workarounds such as X-Forwarded-For headers or the PROXY TCP extension.
- Support an architecture where backends are located across many racks and subnets. This prevents solutions that cannot Continue reading