The Overlay Problem: Getting In and Out
I've been researching overlay network strategies recently. There are plenty of competing implementations available, employing various encapsulations and control plane designs. But every design I've encountered seems ultimately hampered by the same issue: scalability at the edge.
Why Build an Overlay?
Imagine a scenario where we've got 2,000 physical servers split across 50 racks. Each server functions as a hypervisor housing on average 100 virtual machines, resulting in a total of approximately 200,000 virtual hosts (~4,000 per rack).
In an ideal world, we could allocate a /20 of IPv4 space to each rack. The top-of-rack (ToR) L3 switches in each rack would advertise this /20 northbound toward the network core, resulting in a clean, efficient routing table in the core. This is, of course, how IP was intended to function.
Unfortunately, this approach isn't usually viable in the real world because we need to preserve the ability to move a virtual machine from one hypervisor to another (often residing in a different rack) without changing its assigned IP address. Establishing the L3 boundary at the ToR switch prevents us from doing this efficiently.