John Graham-Cumming

Author Archives: John Graham-Cumming

Helping To Build Cloudflare, Part 2: The Most Difficult Two Weeks

This is part 2 of a six part series based on a talk I gave in Trento, Italy. Part 1 is here.

It’s always best to speak plainly and honestly about the situation you are in. Or as Matthew Prince likes to put it “Panic Early”. Long ago I started a company in Silicon Valley which had the most beautiful code. We could have taught a computer science course from the code base. But we had hardly any customers and we failed to “Panic Early” and not face up to the fact that our market was too small.

Ironically, the CEO of that company used to tell people “Get bad news out fast”. This is a good maxim to live by, if you have bad news then deliver it quickly and clearly. If you don’t the bad news won’t go away, and the situation will likely get worse.

Cloudbleed

Cloudflare had a very, very serious security problem back in 2017. This problem became known as Cloudbleed. We had, without knowing it, been leaking memory from inside our machines into responses returned to web browsers. And because our machines are shared across millions of web sites, that meant that HTTP requests Continue reading

Helping To Build Cloudflare, Part 3: Audacity, Diversity and Change

This is part 3 of a six part series based on a talk I gave in Trento, Italy. To start from the beginning go here.

After Cloudbleed, lots of things changed. We started to move away from memory-unsafe languages like C and C++ (there’s a lot more Go and Rust now). And every SIGABRT or crash on any machine results in an email to me and a message to the team responsible. And I don’t let the team leave those problems to fester.

Making 1.1.1.1

So Cloudbleed was a terrible time. Let’s talk about a great time. The launch of our public DNS resolver 1.1.1.1. That launch is a story of an important Cloudflare quality: audacity. Google had launched 8.8.8.8 years ago and had taken the market for a public DNS resolver by storm. Their address is easy to remember, their service is very fast.‌‌

But we thought we could do better. We thought we could be faster, and we thought we could be more memorable. Matthew asked us to get the address 1.1.1.1 and launch a secure, privacy-preserving, public DNS resolver in a couple of months. Continue reading

Helping To Build Cloudflare, Part 1: How I came to work here

This is the text I prepared for a talk at Speck&Tech in Trento, Italy. I thought it might make a good blog post. Because it is 6,000 words I've split it into six separate posts.

Here's part 1:

I’ve worked at Cloudflare for more than seven years. Cloudflare itself is more than eight years old. So, I’ve been there since it was a very small company. About twenty people in fact. All of those people (except one, me) worked from an office in San Francisco. I was the lone member of the London office.

Today there are 900 people working at Cloudflare spread across offices in San Francisco, Austin, Champaign IL, New York, London, Munich, Singapore and Beijing. In London, my “one-person office” (which was my spare bedroom) is now almost 200 people and in a month, we’ll move into new space opposite Big Ben.

The original Cloudflare London "office"

The numbers tell a story about enormous growth. But it’s growth that’s been very carefully managed. We could have grown much faster (in terms of people); we’ve certainly raised enough money to do so.

I ended up at Cloudflare because I gave a really good talk at a conference. Well, Continue reading

Cloudflare’s network boosts security and performance for IBM Cloud customers

Cloudflare’s network boosts security and performance for IBM Cloud customers

Today our partner IBM® announced the general availability of Cloud Internet Services (CIS) Enterprise. It marks a significant step forward in the partnership that we announced at the IBM THINK event in March.

CIS delivers security and performance to IBM Cloud® customers’ internet applications. It brings together Cloudflare’s 150+ points of presence with IBM Cloud’s 60 data centers, stopping attacks before they can even reach the IBM Cloud. CIS Enterprise is integrated into IBM Cloud, allowing IBM Cloud customers to set up and manage Cloudflare’s DDoS mitigation, web application firewall, smart routing and highly customizable load balancer, all from within the IBM Cloud user interface.  

Cloudflare’s network boosts security and performance for IBM Cloud customers

Our Network Map (as of 10/18/18). Click here for the latest version

We thought it timely to give a refresher on how Cloudflare’s network supports IBM Cloud customers. The network is designed to meet requirements of the most demanding enterprise customers. It is based on an architecture that differentiates it from legacy CDN, DNS and DDoS-mitigation services to ensure that internet applications stay online, even in the face of extremely high volume attacks or legitimate traffic spikes.

Cloudflare’s network of data centers, distributed across 74 countries (including 22 in China), has a network Continue reading

Free to code

This week at the Cloudflare Internet Summit I have the honour of sitting down and talking with Sophie Wilson. She designed the very first ARM processor instruction set in the mid-1980s and was part of the small team that built the foundations for the mobile world we live in: if you are reading this on a mobile device, like a phone or tablet, it almost certainly has an ARM processor in it.

But, despite the amazing success of ARM, it’s not the processor that I think of when I think of Sophie Wilson. It’s the BBC Micro, the first computer I ever owned. And it’s the computer on which Wilson and others created ARM despite it having just an 8-bit 6502 processor and 32k of RAM.

Luckily, I still own that machine and recently plugged it into a TV set and turned it on to make sure it was still working 36 years on (you can read about that one time blue smoke came out of it and my repair). I wanted to experience once more the machine Sophie Wilson helped to design. One vital component of that machine was BBC BASIC, stored in a ROM chip on Continue reading

Statement concerning events at Glowbeam Technologies

All of Cloudflare's staff were shocked at the events depicted in NCIS Season 16 Episode 1 where incorrect use of random numbers for encryption resulted in the insertion of multiple trojan horses that brought a nuclear reactor within seconds of a meltdown.

Although Cloudflare has long been a competitor of the company responsible, Glowbeam Technologies, and uses similar random number generation technology, we would like to emphasize that there are significant differences between the two companies.

Firstly, Cloudflare's Lava Lamps are not an "encryption engine" and thus they are not susceptible to tampering by the janitor.

Secondly, all Cloudflare staff undergo extensive background checks.

Thirdly, we were shocked that Glowbeam Technologies' wall of Lava Lamps was a single point of failure. In contrast, Cloudflare uses multiple sources of randomness.

Lastly, Glowbeam Technologies' CEO confirmed that the company did not use "AES" or "key block ciphers", but instead relied solely on their Lava Lamp "encryption engine". Cloudflare strongly advocates for never writing or inventing encryption algorithms and works closely with groups like the IETF to use standard, well understood encryption.

As a result of these events Cloudflare has acquired the assets of Glowbeam Technologies, please visit glowbeamtechnologies.com for more information.

John Graham-Cumming
Chief Technology Officer
Cloudflare, Inc.

Introducing Workers KV

Introducing Workers KV

In 1864 British computer pioneer Charles Babbage described the first key-value store. It was meant to be part of his Analytical Engine. Sadly, the Analytical Engine, which would have been the first programmable computer, was never built. But Babbage lays out clearly the design for his key-value store in his autobiography. He imagined a read-only store implemented as punched cards. He referred to these as Tables:

I explained that the Tables to be used must, of course, be computed and punched on cards by the machine, in which case they would undoubtedly be correct. I then added that when the machine wanted a tabular number, say the logarithm of a given number, that it would ring a bell and then stop itself. On this, the attendant would look at a certain part of the machine, and find that it wanted the logarithm of a given number, say of 2303. The attendant would then go to the drawer containing the pasteboard cards representing its table of logarithms. From amongst these he would take the required logarithmic card, and place it in the machine.

Introducing Workers KV

Punched card illustration from Babbage’s autobiography showing an integer key (2303) and value representing the decimal part of Continue reading

The QUICening

The QUICening

Six o’clock already, I was just in the middle of a dream, now I’m up, awake, looking at my Twitter stream. As I do that the Twitter app is making multiple API calls over HTTPS to Twitter’s servers somewhere on the Internet.

Those HTTPS connections are running over TCP via my home WiFi and broadband connection. All’s well inside the house, the WiFi connection is interference free thanks to my eero system, the broadband connection is stable and so there’s no packet loss, and my broadband provider’s connection to Twitter’s servers is also loss free.

The QUICening

Those are the perfect conditions for HTTPS running over TCP. Not a packet dropped, not a bit of jitter, no congestion. It’s even the perfect conditions for HTTP/2 where multiple streams of requests and responses are being sent from my phone to websites and APIs as I boot my morning. Unlike HTTP/1.1, HTTP/2 is able to use a single TCP connection for multiple, simultaneously in flight requests. That has a significant speed advantage over the old way (one request after another per TCP connection) when conditions are good.

But I have to catch an early train, got to be to work by nine, so Continue reading

Using Cloudflare Workers to identify pwned passwords

Using Cloudflare Workers to identify pwned passwords

Last week Troy Hunt launched his Pwned Password v2 service which has an API handled and cached by Cloudflare using a clever anonymity scheme.

The following simple code can check if a password exists in Troy's database without sending the password to Troy. The details of how it works are found in the blog post above.

use strict;
use warnings;

use LWP::Simple qw/$ua get/;
$ua->agent('Cloudflare Test/0.1');
use Digest::SHA1 qw/sha1_hex/;

uc(sha1_hex($ARGV[0]))=~/^(.{5})(.+)/;
print get("https://api.pwnedpasswords.com/range/$1")=~/$2/?'Pwned':'Ok', "\n";

It's just as easy to implement the same check in other languages, such as JavaScript, which made me realize that I could incorporate the check into a Cloudflare Worker. With a little help from people who know JavaScript far better than me, I wrote the following Worker:

addEventListener('fetch', event => {
  event.respondWith(fetchAndCheckPassword(event.request))
})

async function fetchAndCheckPassword(req) {
  if (req.method == "POST") {
    try {
      const post = await req.formData()
      const pwd = post.get('password')
      const enc = new TextEncoder("utf-8").encode(pwd)

      let hash = await crypto.subtle.digest("SHA-1", enc)
      let hashStr = hex(hash).toUpperCase()
  
      const prefix = hashStr.substring(0, 5)
      const suffix = hashStr.substring(5)

      const pwndpwds = await fetch('https://api.pwnedpasswords.com/range/' + prefix)
      const t =  Continue reading

Large drop in traffic from the Democratic Republic of Congo

It is not uncommon for countries around the world to interrupt Internet access for political reasons or because of social unrest. We've seen this many times in the past (e.g. Gabon, Syria, Togo).

Today, it appears that Internet access in the Democratic Republic of Congo has been greatly curtailed. The BBC reports that Internet access in the capital, Kinshasa was cut on Saturday and iAfrikan reports that the cut is because of anti-Kabila protests.

Our monitoring of traffic from the Democratic Republic of Congo shows a distinct drop off starting around midnight UTC on January 21, 2018. Traffic is down to about 1/3 of its usual level.

Screen-Shot-2018-01-22-at-10.33.58-1

We'll update this blog once we have more information about traffic levels.

An Explanation of the Meltdown/Spectre Bugs for a Non-Technical Audience

Last week the news of two significant computer bugs was announced. They've been dubbed Meltdown and Spectre. These bugs take advantage of very technical systems that modern CPUs have implemented to make computers extremely fast. Even highly technical people can find it difficult to wrap their heads around how these bugs work. But, using some analogies, it's possible to understand exactly what's going on with these bugs. If you've found yourself puzzled by exactly what's going on with these bugs, read on — this blog is for you.

When you come to a fork in the road, take it.” — Yogi Berra

Late one afternoon walking through a forest near your home and navigating with the GPS you come to a fork in the path which you’ve taken many times before. Unfortunately, for some mysterious reason your GPS is not working and being a methodical person you like to follow it very carefully.

Cooling your heels waiting for GPS to start working again is annoying because you are losing time when you could be getting home. Instead of waiting, you decide to make an intelligent guess about which path is most likely based on past experience and set Continue reading

An Explanation of the Meltdown/Spectre Bugs for a Non-Technical Audience

Last week the news of two significant computer bugs was announced. They've been dubbed Meltdown and Spectre. These bugs take advantage of very technical systems that modern CPUs have implemented to make computers extremely fast. Even highly technical people can find it difficult to wrap their heads around how these bugs work. But, using some analogies, it's possible to understand exactly what's going on with these bugs. If you've found yourself puzzled by exactly what's going on with these bugs, read on — this blog is for you.

When you come to a fork in the road, take it.” — Yogi Berra

Late one afternoon walking through a forest near your home and navigating with the GPS you come to a fork in the path which you’ve taken many times before. Unfortunately, for some mysterious reason your GPS is not working and being a methodical person you like to follow it very carefully.

Cooling your heels waiting for GPS to start working again is annoying because you are losing time when you could be getting home. Instead of waiting, you decide to make an intelligent guess about which path is most likely based on past experience and set Continue reading

Technical reading from the Cloudflare blog for the holidays

During 2017 Cloudflare published 172 blog posts (including this one). If you need a distraction from the holiday festivities at this time of year here are some highlights from the year.

CC BY 2.0 image by perzon seo

The WireX Botnet: How Industry Collaboration Disrupted a DDoS Attack

We worked closely with companies across the industry to track and take down the Android WireX Botnet. This blog post goes into detail about how that botnet operated, how it was distributed and how it was taken down.

Randomness 101: LavaRand in Production

The wall of Lava Lamps in the San Francisco office is used to feed entropy into random number generators across our network. This blog post explains how.

ARM Takes Wing: Qualcomm vs. Intel CPU comparison

Our network of data centers around the world all contain Intel-based servers, but we're interested in ARM-based servers because of the potential cost/power savings. This blog post took a look at the relative performance of Intel processors and Qualcomm's latest server offering.

How to Monkey Patch the Linux Kernel

One engineer wanted to combine the Dvorak and QWERTY keyboard layouts and did so by patching the Linux kernel using SystemTap. This blog explains Continue reading

Technical reading from the Cloudflare blog for the holidays

During 2017 Cloudflare published 172 blog posts (including this one). If you need a distraction from the holiday festivities at this time of year here are some highlights from the year.


CC BY 2.0 image by perzon seo

The WireX Botnet: How Industry Collaboration Disrupted a DDoS Attack

We worked closely with companies across the industry to track and take down the Android WireX Botnet. This blog post goes into detail about how that botnet operated, how it was distributed and how it was taken down.

Randomness 101: LavaRand in Production

The wall of Lava Lamps in the San Francisco office is used to feed entropy into random number generators across our network. This blog post explains how.

ARM Takes Wing: Qualcomm vs. Intel CPU comparison

Our network of data centers around the world all contain Intel-based servers, but we're interested in ARM-based servers because of the potential cost/power savings. This blog post took a look at the relative performance of Intel processors and Qualcomm's latest server offering.

How to Monkey Patch the Linux Kernel

One engineer wanted to combine the Dvorak and QWERTY keyboard layouts and did so by patching the Linux kernel using SystemTap. This blog explains Continue reading

2018 and the Internet: our predictions

At the end of 2016, I wrote a blog post with seven predictions for 2017. Let’s start by reviewing how I did.

Didn’t he do well Public Domain image by Michael Sharpe

I’ll score myself with two points for being correct, one point for mostly right and zero for wrong. That’ll give me a maximum possible score of fourteen. Here goes...

2017-1: 1Tbps DDoS attacks will become the baseline for ‘massive attacks’

This turned out to be true but mostly because massive attacks went away as Layer 3 and Layer 4 DDoS mitigation services got good at filtering out high bandwidth and high packet rates. Over the year we saw many DDoS attacks in the 100s of Gbps (up to 0.5Tbps) and then in September announced Unmetered Mitigation. Almost immediately we saw attackers stop bothering to attack Cloudflare-protected sites with large DDoS.

So, I’ll be generous and give myself one point.

2017-2: The Internet will get faster yet again as protocols like QUIC become more prevalent

Well, yes and no. QUIC has become more prevalent as Google has widely deployed it in the Chrome browser and it accounts for about 7% of Internet traffic. At the same time the protocol is working its Continue reading

2018 and the Internet: our predictions

At the end of 2016, I wrote a blog post with seven predictions for 2017. Let’s start by reviewing how I did.

Didn’t he do well
Public Domain image by Michael Sharpe

I’ll score myself with two points for being correct, one point for mostly right and zero for wrong. That’ll give me a maximum possible score of fourteen. Here goes...

2017-1: 1Tbps DDoS attacks will become the baseline for ‘massive attacks’

This turned out to be true but mostly because massive attacks went away as Layer 3 and Layer 4 DDoS mitigation services got good at filtering out high bandwidth and high packet rates. Over the year we saw many DDoS attacks in the 100s of Gbps (up to 0.5Tbps) and then in September announced Unmetered Mitigation. Almost immediately we saw attackers stop bothering to attack Cloudflare-protected sites with large DDoS.

So, I’ll be generous and give myself one point.

2017-2: The Internet will get faster yet again as protocols like QUIC become more prevalent

Well, yes and no. QUIC has become more prevalent as Google has widely deployed it in the Chrome browser and it accounts for about 7% of Internet traffic. At the same time the protocol is working its Continue reading

The end of the road for Server: cloudflare-nginx

The end of the road for Server: cloudflare-nginx

Six years ago when I joined Cloudflare the company had a capital F, about 20 employees, and a software stack that was mostly NGINX, PHP and PowerDNS (there was even a little Apache). Today, things are quite different.

The end of the road for Server: cloudflare-nginx CC BY-SA 2.0 image by Randy Merrill

The F got lowercased, there are now more than 500 people and the software stack has changed radically. PowerDNS is gone and has been replaced with our own DNS server, RRDNS, written in Go. The PHP code that used to handle the business logic of dealing with our customers’ HTTP requests is now Lua code, Apache is long gone and new technologies like Railgun, Warp, Argo and Tiered Cache have been added to our ‘edge’ stack.

And yet our servers still identify themselves in HTTP responses with

Server: cloudflare-nginx

Of course, NGINX is still a part of our stack, but the code that handles HTTP requests goes well beyond the capabilities of NGINX alone. It’s also not hard to imagine a time where the role of NGINX diminishes further. We currently run four instances of NGINX on each edge machine (one for SSL, one for non-SSL, one for caching and one Continue reading

Introducing the Cloudflare Warp Ingress Controller for Kubernetes

Introducing the Cloudflare Warp Ingress Controller for Kubernetes

It’s ironic that the one thing most programmers would really rather not have to spend time dealing with is... a computer. When you write code it’s written in your head, transferred to a screen with your fingers and then it has to be run. On. A. Computer. Ugh.

Of course, code has to be run and typed on a computer so programmers spend hours configuring and optimizing shells, window managers, editors, build systems, IDEs, compilation times and more so they can minimize the friction all those things introduce. Optimizing your editor’s macros, fonts or colors is a battle to find the most efficient path to go from idea to running code.

Introducing the Cloudflare Warp Ingress Controller for Kubernetes CC BY 2.0 image by Yutaka Tsutano

Once the developer is master of their own universe they can write code at the speed of their mind. But when it comes to putting their code into production (which necessarily requires running their programs on machines that they don’t control) things inevitably go wrong. Production machines are never the same as developer machines.

If you’re not a developer, here’s an analogy. Imagine carefully writing an essay on a subject dear to your heart and then publishing it only to be Continue reading

The Super Secret Cloudflare Master Plan, or why we acquired Neumob

The Super Secret Cloudflare Master Plan, or why we acquired Neumob

We announced today that Cloudflare has acquired Neumob. Neumob’s team built exceptional technology to speed up mobile apps, reduce errors on challenging mobile networks, and increase conversions. Cloudflare will integrate the Neumob technology with our global network to give Neumob truly global reach.

It’s tempting to think of the Neumob acquisition as a point product added to the Cloudflare portfolio. But it actually represents a key part of a long term “Super Secret Cloudflare Master Plan”.

The Super Secret Cloudflare Master Plan, or why we acquired Neumob CC BY 2.0 image by Neil Rickards

Over the last few years Cloudflare has been building a large network of data centers across the world to help fulfill our mission of helping to build a better Internet. These data centers all run an identical software stack that implements Cloudflare’s cache, DNS, DDoS, WAF, load balancing, rate limiting, etc.

We’re now at 118 data centers in 58 countries and are continuing to expand with a goal of being as close to end users as possible worldwide.

The data centers are tied together by secure connections which are optimized using our Argo smart routing capability. Our Quicksilver technology enables us to update and modify the settings and software running across this vast network in seconds.

Continue reading

Code Everywhere: Why We Built Cloudflare Workers

It all comes down to the speed of light. It always does. The speed of light limits the latency possible between someone using the Internet and the application they are accessing. It doesn’t matter if they are walking down the street hailing a car using a ride-sharing app, sitting in an office accessing a SaaS application on the web, or if their wearable device is reporting health information over WiFi. The speed of light is everywhere.

When you can’t fight the speed of light you only have one possible solution: move closer to where the end users are. In simplistic terms, that’s what Cloudflare has done by building its network of 117 data centers around the world. We’ve cut the latency between users and servers by moving closer.

But to date all we’ve moved closer are things like SSL handshakes, WAF processing of requests and caching of content. All those things help make Internet applications faster and safer, but there’s a huge missing component... code.

The code that makes Internet applications work is still sequestered in servers and cloud services around the world. And there are only a limited number of such locations even for large cloud Continue reading