Author Archives: Kevin Deems
Author Archives: Kevin Deems
Cloudflare has data centers in over 330 cities globally, so you might think we could easily disrupt a few at any time without users noticing when we plan data center operations. However, the reality is that disruptive maintenance requires careful planning, and as Cloudflare grew, managing these complexities through manual coordination between our infrastructure and network operations specialists became nearly impossible.
It is no longer feasible for a human to track every overlapping maintenance request or account for every customer-specific routing rule in real time. We reached a point where manual oversight alone couldn't guarantee that a routine hardware update in one part of the world wouldn't inadvertently conflict with a critical path in another.
We realized we needed a centralized, automated "brain" to act as a safeguard — a system that could see the entire state of our network at once. By building this scheduler on Cloudflare Workers, we created a way to programmatically enforce safety constraints, ensuring that no matter how fast we move, we never sacrifice the reliability of the services on which our customers depend.
In this blog post, we’ll explain how we built it, and share the results we’re seeing now.