Taiji: managing global user traffic for large-scale Internet services at the edge
Taiji: managing global user traffic for large-scale internet services at the edge Xu et al., SOSP’19
It’s another networking paper to close out the week (and our coverage of SOSP’19), but whereas [Snap][Snap] looked at traffic routing within the datacenter, Taiji is concerned with routing traffic from the edge to a datacenter. It’s been in production deployment at Facebook for the past four years.
The problem: mapping user requests to datacenters
When a user makes a request to http://www.facebook.com, DNS will route the request to one of dozens of globally deployed edge nodes. Within the edge node, a load balancer (the Edge LB) is responsible for routing requests through to frontend machines in datacenters. The question Taiji addresses is a simple one on the surface: what datacenter should a given request be routed to?

There’s one thing that Taiji doesn’t have to worry about: backbone capacity between the edge nodes and datacenters— this is provisioned in abundance such that it is not a consideration in balancing decisions. However, there are plenty of other things going on that make the decision challenging:
- Some user requests are sticky (i.e., they have associated session state) and always Continue reading