Yan Zhai

Author Archives: Yan Zhai

Soar: Simulation for Observability, reliAbility, and secuRity

Soar: Simulation for Observability, reliAbility, and secuRity
Soar: Simulation for Observability, reliAbility, and secuRity

Serving more than approximately 25 million Internet properties is not an easy thing, and neither is serving 20 million requests per second on average. At Cloudflare, we achieve this by running a homogeneous edge environment: almost every Cloudflare server runs all Cloudflare products.

Soar: Simulation for Observability, reliAbility, and secuRity
Figure 1. Typical Cloudflare service model: when an end-user (a browser/mobile/etc) visits an origin (a Cloudflare customer), traffic is routed via the Internet to the Cloudflare edge network, and Cloudflare communicates with the origin servers from that point.

As we offer more and more products and enjoy the benefit of horizontal scalability, our edge stack continues to grow in complexity. Originally, we only operated at the application layer with our CDN service and DoS protection. Then we launched transport layer products, such as Spectrum and Argo. Now we have further expanded our footprint into the IP layer and physical link with Magic Transit. They all run on every machine we have. The work of our engineers enables our products to evolve at a fast pace, and to serve our customers better.

However, such software complexity presents a sheer challenge to operation: the more changes you make, the more likely it is that something is going to break. Continue reading