John Graham-Cumming

Author Archives: John Graham-Cumming

Code Everywhere: Why We Built Cloudflare Workers

It all comes down to the speed of light. It always does. The speed of light limits the latency possible between someone using the Internet and the application they are accessing. It doesn’t matter if they are walking down the street hailing a car using a ride-sharing app, sitting in an office accessing a SaaS application on the web, or if their wearable device is reporting health information over WiFi. The speed of light is everywhere.

When you can’t fight the speed of light you only have one possible solution: move closer to where the end users are. In simplistic terms, that’s what Cloudflare has done by building its network of 117 data centers around the world. We’ve cut the latency between users and servers by moving closer.

But to date all we’ve moved closer are things like SSL handshakes, WAF processing of requests and caching of content. All those things help make Internet applications faster and safer, but there’s a huge missing component... code.

The code that makes Internet applications work is still sequestered in servers and cloud services around the world. And there are only a limited number of such locations even for large cloud Continue reading

No Scrubs: The Architecture That Made Unmetered Mitigation Possible

When building a DDoS mitigation service it’s incredibly tempting to think that the solution is scrubbing centers or scrubbing servers. I, too, thought that was a good idea in the beginning, but experience has shown that there are serious pitfalls to this approach.

A scrubbing server is a dedicated machine that receives all network traffic destined for an IP address and attempts to filter good traffic from bad. Ideally, the scrubbing server will only forward non-DDoS packets to the Internet application being attacked. A scrubbing center is a dedicated location filled with scrubbing servers.

Three Problems With Scrubbers

The three most pressing problems with scrubbing are: bandwidth, cost, knowledge.

The bandwidth problem is easy to see. As DDoS attacks have scaled to >1Tbps having that much network capacity available is problematic. Provisioning and maintaining multiple-Tbps of bandwidth for DDoS mitigation is expensive and complicated. And it needs to be located in the right place on the Internet to receive and absorb an attack. If it’s not then attack traffic will need to be received at one location, scrubbed, and then clean traffic forwarded to the real server: that can introduce enormous delays with a limited number of locations.

Continue reading

Three little tools: mmsum, mmwatch, mmhistogram

In a recent blog post, my colleague Marek talked about some SSDP-based DDoS activity we'd been seeing recently. In that blog post he used a tool called mmhistogram to output an ASCII histogram.

That tool is part of a small suite of command-line tools that can be handy when messing with data. Since a reader asked for them to be open sourced... here they are.

mmhistogram

Suppose you have the following CSV of the ages of major Star Wars characters at the time of Episode IV:

Anakin Skywalker (Darth Vader),42
Boba Fett,32
C-3PO,32
Chewbacca,200
Count Dooku,102
Darth Maul,54
Han Solo,29
Jabba the Hutt,600
Jango Fett,66
Jar Jar Binks,52
Lando Calrissian,31
Leia Organa (Princess Leia),19
Luke Skywalker,19
Mace Windu,72
Obi-Wan Kenobi,57
Palpatine,82
Qui-Gon Jinn,92
R2-D2,32
Shmi Skywalker,72
Wedge Antilles,21
Yoda,896

You can get an ASCII histogram of the ages as follows using the mmhistogram tool.

$ cut -d, -f2 epiv | mmhistogram -t "Age"
Age min:19.00 avg:123.90 med=54.00 max:896.00 dev:211.28 count:21
Age:
 value |-------------------------------------------------- count
     0 |                                                   0
     1 |                                                   0
     2 |                                                   0
     4 |                                                   0
     8 |                                                   0
    16 |************************************************** 8
    32 |                         ************************* 4
    64 |             ************************************* 6
   128 |                                            ****** 1
   256  Continue reading

Incident report on memory leak caused by Cloudflare parser bug

Last Friday, Tavis Ormandy from Google’s Project Zero contacted Cloudflare to report a security problem with our edge servers. He was seeing corrupted web pages being returned by some HTTP requests run through Cloudflare.

It turned out that in some unusual circumstances, which I’ll detail below, our edge servers were running past the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data. And some of that data had been cached by search engines.

For the avoidance of doubt, Cloudflare customer SSL private keys were not leaked. Cloudflare has always terminated SSL connections through an isolated instance of NGINX that was not affected by this bug.

We quickly identified the problem and turned off three minor Cloudflare features (email obfuscation, Server-side Excludes and Automatic HTTPS Rewrites) that were all using the same HTML parser chain that was causing the leakage. At that point it was no longer possible for memory to be returned in an HTTP response.

Because of the seriousness of such a bug, a cross-functional team from software engineering, infosec and operations formed in San Francisco and London to fully understand Continue reading

How and why the leap second affected Cloudflare DNS

At midnight UTC on New Year’s Day, deep inside Cloudflare’s custom RRDNS software, a number went negative when it should always have been, at worst, zero. A little later this negative value caused RRDNS to panic. This panic was caught using the recover feature of the Go language. The net effect was that some DNS resolutions to some Cloudflare managed web properties failed.

The problem only affected customers who use CNAME DNS records with Cloudflare, and only affected a small number of machines across Cloudflare's 102 PoPs. At peak approximately 0.2% of DNS queries to Cloudflare were affected and less than 1% of all HTTP requests to Cloudflare encountered an error.

This problem was quickly identified. The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC. We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand.

A little bit about Cloudflare DNS

Cloudflare customers use our DNS service to serve the authoritative answers for DNS queries for their domains. They need to tell us the IP address of their origin web servers so we can contact the Continue reading

2017 and the Internet: our predictions

An abbreviated version of this post originally appeared on TechCrunch

Looking back over 2016, we saw the good and bad that comes with widespread use and abuse of the Internet.

In both Gabon and Gambia, Internet connectivity was disrupted during elections. The contested election in Gambia started with an Internet blackout that lasted a short time. In Gabon, the Internet shutdown lasted for days. Even as we write this countries like DR Congo are discussing blocking specific Internet services, clearly forgetting the lessons learned in these other countries.

CC BY 2.0 image by Aniket Thakur

DDoS attacks continued throughout the year, hitting websites big and small. Back in March, we wrote about 400 Gbps attacks that were happening over the weekend, and then in December, it looked like attackers were treating attacks as a job to be performed from 9 to 5.

In addition to real DDoS, there were also empty threats from a group calling itself Armada Collective and demanding Bitcoin for sites and APIs to stay online. Another group popped up to copycat the same modus-operandi.

The Internet of Things became what many had warned it would become: an army of devices used for attacks. A botnet Continue reading

The Daily DDoS: Ten Days of Massive Attacks

Back in March my colleague Marek wrote about a Winter of Whopping Weekend DDoS Attacks where we were seeing 400Gbps attacks occurring mostly at the weekends. We speculated that attackers were busy with something else during the week.

This winter we've seen a new pattern, and attackers aren't taking the week off, but they do seem to be working regular hours.

CC BY 2.0 image by Carol VanHook

On November 23, the day before US Thanksgiving, our systems detected and mitigated an attack that peaked at 172Mpps and 400Gbps. The attack started at 1830 UTC and lasted non-stop for almost exactly 8.5 hours stopping at 0300 UTC. It felt as if an attacker 'worked' a day and then went home.

The very next day the same thing happened again (although the attack started 30 minutes earlier at 1800 UTC).

On the third day the attacker started promptly at 1800 UTC but went home a little early at around 0130 UTC. But they managed to peak the attack over 200Mpps and 480Gbps.

And the attacker just kept this up day after day. Right through Thanksgiving, Black Friday, Cyber Monday and into this week. Night after night attacks were peaking Continue reading

How the Dyn outage affected Cloudflare

Last Friday the popular DNS service Dyn suffered three waves of DDoS attacks that affected users first on the East Coast of the US, and later users worldwide. Popular websites, some of which are also Cloudflare customers, were inaccessible. Although Cloudflare was not attacked, joint Dyn/Cloudflare customers were affected.

Almost as soon as Dyn came under attack we noticed a sudden jump in DNS errors on our edge machines and alerted our SRE and support teams that Dyn was in trouble. Support was ready to help joint customers and we began looking in detail at the effect the Dyn outage was having on our systems.

An immediate concern internally was that since our DNS servers were unable to reach Dyn they would be consuming resources waiting on timeouts and retrying. The first question I asked the DNS team was: “Are we seeing increased DNS response latency?” rapidly followed by “If this gets worse are we likely to?”. Happily, the response to both those questions (after the team analyzed the situation) was no.

CC BY-SA 2.0 image by tracyshaun

However, that didn’t mean we had nothing to do. Operating a large scale system like Cloudflare that Continue reading

Fixing the mixed content problem with Automatic HTTPS Rewrites

CloudFlare aims to put an end to the unencrypted Internet. But the web has a chicken and egg problem moving to HTTPS.

Long ago it was difficult, expensive, and slow to set up an HTTPS capable web site. Then along came services like CloudFlare’s Universal SSL that made switching from http:// to https:// as easy as clicking a button. With one click a site was served over HTTPS with a freshly minted, free SSL certificate.

Boom.

Suddenly, the website is available over HTTPS, and, even better, the website gets faster because it can take advantage of the latest web protocol HTTP/2.

Unfortunately, the story doesn’t end there. Many otherwise secure sites suffer from the problem of mixed content. And mixed content means the green padlock icon will not be displayed for an https:// site because, in fact, it’s not truly secure.

Here’s the problem: if an https:// website includes any content from a site (even its own) served over http:// the green padlock can’t be displayed. That’s because resources like images, JavaScript, audio, video etc. included over http:// open up a security hole into the secure web site. A backdoor to trouble.

Web browsers have known this was a problem Continue reading

Welcoming Sir Tim Berners-Lee to the CloudFlare Internet Summit

This Thursday, September 15, we are holding our second Internet Summit at our offices in San Francisco. We have a fascinating lineup of speakers covering policy, technology, privacy, and business.

We are very pleased to announce that Sir Tim Berners-Lee will be our special guest in a fireside chat session.

Sir Tim Berners-Lee

Twenty-five years ago, Sir Tim laid the foundations of our modern web-connected society; first, in 1989, with his proposal outlining his idea for the Web and then by developing HTML, the first web pages, browser, and server.

He has continued this work through the World Wide Web Consortium (W3C)  and World Wide Web Foundation and we are delighted that he will be on stage with us to talk about the web's history, expanding the web to truly reach everyone on Earth, and privacy and freedom of expression online.

If you would like to attend the Summit and hear Sir Tim and the other great speakers, sign up here.

After 4 days, Gabon is getting back on the Internet

On September 1, we reported that we had seen a complete shutdown of Internet access to CloudFlare sites from Gabon.

This morning, Internet connectivity in Gabon appears to have been at least partially restored starting at around 0500 UTC. Some news reports indicate that Internet access has been restored in the capital but that access to social media sites is still restricted.

We will continue to monitor the situation to see if traffic from Gabon return to its normal levels and update this blog post.

Unrest in Gabon leads to Internet shutdown

A second day of rioting in Gabon after the recent election is accompanied by an Internet blackout. Residents of the capital, Libreville, reported that Internet access had been cut and we can confirm that we saw a sudden shutdown of Internet access from Gabon to sites that use CloudFlare.

These three graphs show the major networks inside Gabon shutting off suddenly with a minuscule amount of traffic making it through.

The charts show that Internet access shutdown at different times for different networks. At the time of writing the Internet appears to be almost completely cut off in Gabon.

Panne d'Internet au Gabon après l'élection

Un deuxième jour d'émeutes au Gabon après l'élection récente est accompagnée d'une panne d'Internet. Les résidents de la capitale, Libreville, ont indiqué que l'accès à Internet avait été coupé et CloudFlare peut confirmer que nous avons vu un arrêt brutal de l'accès Internet du Gabon vers nos sites.

Ces trois graphiques montrent que les grands réseaux à l'intérieur du Gabon étaient coupé soudainement.

Les graphiques montrent que l'arrêt de l'accès à Internet à des moments différents pour les différents réseaux. Au moment de la rédaction de l'Internet semble être presque complètement coupé au Gabon.

Evenly Distributed Future

Traveling back and forth between the UK and US I often find myself answering the question “What does CloudFlare do?”. That question gets posed by USCIS on arrival and I’ve honed a short and accurate answer: “CloudFlare protects web sites from hackers, makes web sites faster and ensures they work on your computer, phone or tablet.

CC BY 2.0 image by d26b73

If anyone, border agents or others, wants more detail I usually say: “If you run a web site or API for an app and you are Amazon.com, Google, Yahoo or one of a handful of major Internet sites you have the expertise to stay on top of the latest technologies and attacks; you have the staff to accelerate your web site and keep it fully patched. Anyone else, and that’s almost every web site on the Internet, simply will not have the money, people, or knowledge to ‘be a Google’. That’s where CloudFlare comes in: we make sure to stay on top of the latest trends in the Internet so that every web site can ‘be Google’."

The author William Gibson has said many times: “The future is already here Continue reading

The Cuban CDN

On a recent trip to Cuba I brought with me a smartphone and hoped to get Internet access either via WiFi or 3G. I managed that (at a price) but also saw for myself how Cubans get access to an alternate Internet delivered by sneakernet.

Cuba is currently poorly served by the Internet with a small number of public WiFi hotspots. There are currently 175 public WiFi hotspots in the country, many in public parks. In addition, many large hotels also have public WiFi. Since this is the primary way Cubans get Internet access it’s not uncommon to see situations like this:

Getting on the WiFi means buying a card that gives you access for 2 CUC ($2) per hour. These cards have a login number and a password (hidden behind a scratch off panel). The hour can be used in chunks by logging off and on.

There’s also mobile phone access to the Internet (I saw 3G, EDGE and GPRS as I traveled across Cuba), but at 1 CUC ($1) per MB it’s very expensive. The phone company does provide email access (to their own email service) and so some Cubans I met used their phones to get Continue reading

HTTP/2 Server Push with multiple assets per Link header

In April we announced that we had added experimental support for HTTP/2 Server Push to all CloudFlare web sites. We did this so that our customers could iterate on this new functionality.

CC BY 2.0 image by https://www.flickr.com/photos/mryipyop/

Our implementation of Server Push made use of the HTTP Link header as detailed in W3C Preload Working Draft.

We also showed how to make Server Push work from within PHP code and many people started testing and using this feature.

However, there was a serious restriction in our initial version: it was not possible to specify more than one asset per Link header for Server Push and many CMS and web development platforms would not allow multiple Link headers.

We have now addressed that problem and it is possible to request that multiple assets be pushed in a single Link header. This change is live and was used to push assets in this blog post to your browser if your browser supports HTTP/2.

When CloudFlare reads a Link header sent by an origin web server it will remove assets that it pushes from the Link header passed on to the web browser. That made it a little difficult Continue reading

Optimizing TLS over TCP to reduce latency

The layered nature of the Internet (HTTP on top of some reliable transport (e.g. TCP), TCP on top of some datagram layer (e.g. IP), IP on top of some link (e.g. Ethernet)) has been very important in its development. Different link layers have come and gone over time (any readers still using 802.5?) and this flexibility also means that a connection from your web browser might traverse your home network over WiFi, then down a DSL line, across fiber and finally be delivered over Ethernet to the web server. Each layer is blissfully unaware of the implementation of the layer below it.

But there are some disadvantages to this model. In the case of TLS (the most common standard used for sending encrypted data across in the Internet and the protocol your browser uses with visiting an https:// web site) the layering of TLS on top of TCP can cause delays to the delivery of a web page.

That’s because TLS divides the data being transmitted into records of a fixed (maximum) size and then hands those records to TCP for transmission. TCP promptly divides those records up into segments which are then transmitted. Ultimately, Continue reading

The Sleepy User Agent

From time to time a customer writes in and asks about certain requests that have been blocked by the CloudFlare WAF. Recently, a customer couldn’t understand why it appeared that some simple GET requests for their homepage were listed as blocked in WAF analytics.

A sample request looked liked this:

GET / HTTP/1.1
Host: www.example.com
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (compatible; MSIE 11.0; Windows NT 6.1; Win64; x64; Trident/5.0)'+(select*from(select(sleep(20)))a)+' 
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,fr;q=0.6

As I said, a simple request for the homepage of the web site, which at first glance doesn’t look suspicious at all. Unless your take a look at the User-Agent header (its value is the string that identifies the browser being used):

Mozilla/5.0 (compatible; MSIE 11.0; Windows NT 6.1; Win64; x64; Trident/5.0)'+(select*from(select(sleep(20)))a)+

The start looks reasonable (it’s apparently Microsoft Internet Explorer 11) but the agent strings ends with '+(select*from(select(sleep(20)))a)+. The attacker is attempting a SQL injection inside the User-Agent value.

It’s common to see SQL injection in URIs and form parameters, but here the attacker has hidden the SQL query select * from (select(sleep(20))) inside the User-Agent Continue reading

Using HTTP/2 Server Push with PHP

Two weeks ago CloudFlare announced that it was supporting HTTP/2 Server Push for all our customers. By simply adding a Link header to an HTTP response specifying preload CloudFlare would automatically push items to web browsers that support Server Push.

To illustrate how easy this is I create a small PHP page that uses the PHP header function to insert appropriate Link headers to push images to the web browser via CloudFlare. The web page looks like this when loaded:

There are two images loaded from the same server both of which are pushed if the web browser supports Server Push. This is achieved by inserting two Link headers in the HTTP response. The response looks like:

HTTP/1.1 200 OK
Server: nginx/1.9.15
Date: Fri, 13 May 2016 10:52:13 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Link: </images/drucken.jpg>; rel=preload; as=image
Link: </images/empire.jpg>; rel=preload; as=image

At the bottom are the two Link headers corresponding to the two images on the page with the rel=preload directive as specified in W3C preload draft.

The complete code can be found in this gist but the core of the code looks like this:

    <?php
    function pushImage($uri) {
        header("Link: <{$uri}>; rel=preload;  Continue reading

Open sourcing our NGINX HTTP/2 + SPDY code

In December, we released HTTP/2 support for all customers and on April 28 we released HTTP/2 Server Push support as well.

The release of HTTP/2 by CloudFlare had a huge impact on the number of sites supporting and using the protocol. Today, 50% of sites that use HTTP/2 are served via CloudFlare.

CC BY 2.0 image by JD Hancock

When we released HTTP/2 support we decided not to deprecate SPDY immediately because it was still in widespread use and we promised to open source our modifications to NGINX as it was not possible to support both SPDY and HTTP/2 together with the standard release of NGINX.

We've extracted our changes and they are available as a patch here. This patch should build cleanly against NGINX 1.9.7.

The patch means that NGINX can be built with both --with-http_v2_module and --with-http_spdy_module. And it will accept both the spdy and http2 keywords to the listen directive.

To configure both HTTP/2 and SPDY in NGINX you'll need to run:

./configure --with-http_spdy_module --with-http_v2_module --with-http_ssl_module

Note that you need SSL support for both SPDY and HTTP/2.

Then it will be possible to configure an NGINX server to support both HTTP/2 and SPDY on Continue reading

Inside ImageTragick: The Real Payloads Being Used to Hack Websites

Last week multiple vulnerabilities were made public in the popular image manipulation software, ImageMagick. These were quickly named ImageTragick. Although a vulnerability in image manipulation software might not seem like a problem for web site owners it is in fact a genuine security concern.

CloudFlare quickly rolled out a WAF rule to protect our customers from this vulnerability. It was automatically deployed for all customers with the WAF enabled. We know that it takes time for customers to upgrade their web server software and so the WAF protects them in the interim.

Many websites allow users to upload images and the websites themselves often manipulate these images using software like ImageMagick. For example, if you upload a picture of yourself to use as an avatar, it will very likely be resized by the website. ImageMagick is very popular and there are plugins that make it easy to use with PHP, Ruby, Node.js and other languages so it is common for websites to use it for image resizing or cropping.

Unfortunately, researchers discovered that it was possible to execute arbitrary code (CVE-2016-3714) by hiding it inside image files that a user uploads. That means an attacker can make Continue reading