The Internet is a cooperative system: CNAME to Dyn DNS outage of 6 July 2015
Today, shortly after 21:00 UTC, on our internal operations chat there was a scary message from one of our senior support staff: "getting DNS resolution errors on support.cloudflare.com", at the same time as automated monitoring indicated a problem. Shortly thereafter, we saw alarms and feedback from a variety of customers (but not everyone) reporting "1001 errors", which indicated a DNS resolution error on the CloudFlare backend. Needless to say, this got an immediate and overwhelming response from our operations and engineering teams, as we hadn't changed anything and had no other indications of anomaly.
In the course of debugging, we were able to identify common characteristics of affected sites—CNAME-based users of CloudFlare, rather than complete domain hosted entirely on CloudFlare, which, ironically, included our own support site, support.cloudflare.com. When users point (via CNAME) to a domain instead of providing us with an IP address, our network resolves that name —- and is obviously unable to connect if the DNS provider has issues. (Our status page https://www.cloudflarestatus.com/ is off-network and was unaffected). Then, we were investigating why only certain domains were having issues—was the issue with the upstream DNS? Testing whether their domains were resolvable Continue reading