This is part 5 of a six part series based on a talk I gave in Trento, Italy. To start from the beginning go here.
So, let me talk a bit about people. Software is made by people. Sometimes individuals but more likely by teams. I’ve talked earlier about some aspects of our architecture and our frequent rewrites but it’s people that make all that work.
And, honestly, people can be an utter joy and a total pain. Finding, keeping, nurturing people and teams is the single most important thing you can do in a company. No doubt.
Finding people is really hard. Firstly, the technology industry is booming, and so engineers have a lot of choices. Countries create special visas just for them. Politicians line up to create mini-Silicon Valleys in their countries. Life is good!
But the really hard thing is interviewing. How do you find good people from an interview? I don’t know the answer to that. We put people through on average 8 interviews and a pair programming exercise. We look at open source contributions. Sometimes we look at people’s degrees.
We tend to look for potential. An old boss used to say, “Don’t Continue reading
When I come to work at Cloudflare, I understand and believe in this main purpose of why we exist: Helping to Build a Better Internet.
The reason why we feel like we can help build a better internet is simply because we believe in values that instill a nature of freedom, privacy, and empowerment in the tool that helps individuals broaden their intellectual and cultural perspective on the daily.
Knowing all of this, our own great company needs to be able to build itself daily into a better company. And that starts with having those conversations which are always uncomfortable. And let me be clear in saying this, being uncomfortable is a good thing because that makes one grow and not be stagnant. Saying all that, here we go...
The Afrocultural community at Cloudflare should take pride in being diverse and inclusive for all just as we all work together to help build a better internet for all.
And one of the many ways we can build upon this effort is to do more than just belong in a work place and eventually build off of that, feeling normal over time. When I mean belong, it’s more than the "Impostor Continue reading
This is part 4 of a six part series based on a talk I gave in Trento, Italy. To start from the beginning go here.
We don’t believe that any of our software, not a single line of code, provides us with a long-term advantage. We could, today, open source every single line of code at Cloudflare and we don’t believe we’d be hurt by it.
Why don’t we? We actually do open source a lot of code, but we try to be thoughtful about it. Firstly, a lot of our code is so Cloudflare-specific, full of logic about how our service works, that it’s not generic enough for someone else to pick up and use for their service. So, for example, open sourcing the code that runs our web front end would be largely useless.
But other bits of software are generic. There’s currently a debate going on internally about a piece of software called Quicksilver. I mentioned before that Cloudflare used a distributed key-value store to send configuration to machines across the world. We used to use an open source project called Kyoto Tycoon. It was pretty cool.
But Continue reading
This is part 2 of a six part series based on a talk I gave in Trento, Italy. Part 1 is here.
It’s always best to speak plainly and honestly about the situation you are in. Or as Matthew Prince likes to put it “Panic Early”. Long ago I started a company in Silicon Valley which had the most beautiful code. We could have taught a computer science course from the code base. But we had hardly any customers and we failed to “Panic Early” and not face up to the fact that our market was too small.
Ironically, the CEO of that company used to tell people “Get bad news out fast”. This is a good maxim to live by, if you have bad news then deliver it quickly and clearly. If you don’t the bad news won’t go away, and the situation will likely get worse.
Cloudflare had a very, very serious security problem back in 2017. This problem became known as Cloudbleed. We had, without knowing it, been leaking memory from inside our machines into responses returned to web browsers. And because our machines are shared across millions of web sites, that meant that HTTP requests Continue reading
This is part 3 of a six part series based on a talk I gave in Trento, Italy. To start from the beginning go here.
After Cloudbleed, lots of things changed. We started to move away from memory-unsafe languages like C and C++ (there’s a lot more Go and Rust now). And every SIGABRT or crash on any machine results in an email to me and a message to the team responsible. And I don’t let the team leave those problems to fester.
So Cloudbleed was a terrible time. Let’s talk about a great time. The launch of our public DNS resolver That launch is a story of an important Cloudflare quality: audacity. Google had launched years ago and had taken the market for a public DNS resolver by storm. Their address is easy to remember, their service is very fast.
But we thought we could do better. We thought we could be faster, and we thought we could be more memorable. Matthew asked us to get the address and launch a secure, privacy-preserving, public DNS resolver in a couple of months. Continue reading
This is the text I prepared for a talk at Speck&Tech in Trento, Italy. I thought it might make a good blog post. Because it is 6,000 words I've split it into six separate posts.
Here's part 1:
I’ve worked at Cloudflare for more than seven years. Cloudflare itself is more than eight years old. So, I’ve been there since it was a very small company. About twenty people in fact. All of those people (except one, me) worked from an office in San Francisco. I was the lone member of the London office.
Today there are 900 people working at Cloudflare spread across offices in San Francisco, Austin, Champaign IL, New York, London, Munich, Singapore and Beijing. In London, my “one-person office” (which was my spare bedroom) is now almost 200 people and in a month, we’ll move into new space opposite Big Ben.
The numbers tell a story about enormous growth. But it’s growth that’s been very carefully managed. We could have grown much faster (in terms of people); we’ve certainly raised enough money to do so.
I ended up at Cloudflare because I gave a really good talk at a conference. Well, Continue reading
At Cloudflare, we aim to make the Internet faster and safer for everyone. One way we do this is through caching: we keep a copy of our customer content in our 165 data centers around the world. This brings content closer to users and reduces traffic back to origin servers.
Today, we’re excited to announce a huge change in our how cache works. Cloudflare Workers now integrates the Cache API, giving you programmatic control over our caches around the world.
Figuring out what to cache and how can get complicated. Consider an e-commerce site with a shopping cart, a Content Management System (CMS) with many templates and hundreds of articles, or a GraphQL API. Each contains a mix of elements that are dynamic for some users, but might stay unchanged for the vast majority of requests.
Over the last 8 years, we’ve added more features to give our customers flexibility and control over what goes in the cache. However, we’ve learned that we need to offer more than just adding settings in our dashboard. Our customers have told us clearly that they want to be able to express their ideas in code, to build Continue reading
HTTP is the application protocol that powers the Web. It began life as the so-called HTTP/0.9 protocol in 1991, and by 1999 had evolved to HTTP/1.1, which was standardised within the IETF (Internet Engineering Task Force). HTTP/1.1 was good enough for a long time but the ever changing needs of the Web called for a better suited protocol, and HTTP/2 emerged in 2015. More recently it was announced that the IETF is intending to deliver a new version - HTTP/3. To some people this is a surprise and has caused a bit of confusion. If you don't track IETF work closely it might seem that HTTP/3 has come out of the blue. However, we can trace its origins through a lineage of experiments and evolution of Web protocols; specifically the QUIC transport protocol.
If you're not familiar with QUIC, my colleagues have done a great job of tackling different angles. John's blog describes some of the real-world annoyances of today's HTTP, Alessandro's blog tackles the nitty-gritty transport layer details, and Nick's blog covers how to get hands on with some testing. We've collected these and more at And if that tickles your fancy, be sure Continue reading
This is a guest post by Tejas Dinkar, who is the Head of Engineering at Quintype, a platform for digital publishing. He’s continually looking for ways to make applications run faster and cheaper. You can find him on Github and Twitter.
TL;DR: Check out create-cloudflare-worker.
At Quintype, we are continually looking for new and innovative ways to use our CDN. Quintype moved to Cloudflare last year, partly because of the power of Cloudflare Workers. Workers have been a very important tool in our belt, and in this blog post we will talk a little bit about our worker development lifecycle.
Cloudflare Workers have drastically changed the way we architect and deploy things at Quintype. Quintype is a platform that powers many publishers, including many high volume ones like The Quint, BloombergQuint, Swarajya, and Fortune India. An average month sees hundreds of millions of page views come through our network.
Maintaining a healthy cache hit ratio is the key to scaling a content heavy app. Ensuring requests are served from Cloudflare is faster, and cheaper, as requests do not have to come through to an origin. We actively architect our apps to ensure that we Continue reading
As of December 22, 2018, parts of the US Government have “shut down” because of a lapse in appropriation. The shutdown has caused the furlough of employees across the government and has affected federal contracts. An unexpected side-effect of this shutdown has been the expiration of TLS certificates on some .gov websites. This side-effect has emphasized a common issue on the Internet: the usage of expired certificates and their erosion of trust.
For an entity to provide a secure website, it needs a valid TLS certificate attached to the website server. These TLS certificates have both start dates and expiry dates. Normally certificates are renewed prior to their expiration. However, if there’s no one to execute this process, then websites serve expired certificates--a poor security practice.
This means that people looking for government information or resources may encounter alarming error messages when visiting important .gov websites:
The content of the website hasn’t changed; it’s just the cryptographic exchange that’s invalid (an expired certificate can’t be validated). These expired certificates present a trust problem. Certificate errors often dissuade people from accessing a website, and imply that the site is not to be trusted. Browsers purposefully make it difficult to continue to Continue reading
During last year’s Birthday Week we announced early support for QUIC, the next generation encrypted-by-default network transport protocol designed to secure and accelerate web traffic on the Internet.
We are not quite ready to make this feature available to every Cloudflare customer yet, but while you wait we thought you might enjoy a slice of quiche, our own open-source implementation of the QUIC protocol written in Rust.
Quiche will allow us to keep on top of changes to the QUIC protocol as the standardization process progresses and experiment with new features more easily. Let’s have a quick look at it together.
The main design principle that guided quiche’s initial development was exposing most of the QUIC complexity to applications through a minimal and intuitive API, but without making too many assumptions about the application itself, in order to allow us to reuse the same library in different contexts.
For example, while we think Rust is great, most of the stack that deals with HTTP requests on Cloudflare’s edge network is still written in good ol’ C, which means that our QUIC implementation would need to be integrated into that.
The quiche API can process Continue reading
Cloudflare is proud to partner with Mesosphere on their new Argo Tunnel offering available within their DC/OS (Data Center / Operating System) catalogue! Before diving deeper into the offering itself, we’ll first do a quick overview of the Mesophere platform, DC/OS.
Mesosphere DC/OS provides application developers and operators an easy way to consistently deploy and run applications and data services on cloud providers and on-premise infrastructure. The unified developer and operator experience across clouds makes it easy to realize use cases like global reach, resource expansion, and business continuity.
In this multi cloud world Cloudflare and Mesosphere DC/OS are great complements. Mesosphere DC/OS provides the same common services experience for developers and operators, and Cloudflare provides the same common service access experience across cloud providers. DC/OS helps tremendously for avoiding vendor lock-in to a single provider, while Cloudflare can load balance traffic intelligently (in addition to many other services) at the edge between providers. This new offering will allow you to load balance through the use of Argo Tunnel.
Cloudflare Argo Tunnel is a private connection between your services and Cloudflare. Tunnel makes it such that only traffic that routes through the Continue reading
This is a guest post by Jamie Mason, who is the Head of Test Servers at SamKnows. This post originally appears on the SamKnows Megablog.
We leveraged Cloudflare Workers to expand the SamKnows measurement infrastructure.
At SamKnows, we run lots of tests to measure internet performance. Actually, that’s an understatement. Our software is embedded on tens of millions of devices, and that number grows daily.
We measure performance between the user’s home and the internet, across dozens of metrics. Some of these metrics measure the performance of major video-streaming services, popular games, or large websites. Others focus on the more traditional ‘quality of service’ metrics: speed, latency, and packet loss.
In order to measure speed, latency, and packet loss, SamKnows needs test servers to carry out the measurements against. These servers should be relatively near to the user’s home - this ensures that we’re measuring solely the user’s internet connection (i.e. what their Internet Service Provider sells them) and not some external factor.
As a result, we manage high-capacity test servers all over the world. Some are donated by research groups, some we host ourselves in major data centers, and still others are run inside ISPs’ own networks.
Customers Continue reading
When you launch your domain to the world, you rely on the Domain Name System (DNS) to direct your users to the address for your site. However, DNS cannot guarantee that your visitors reach your content because DNS, in its basic form, lacks authentication. If someone was able to poison the DNS responses for your site, they could hijack your visitors' requests.
The Domain Name System Security Extensions (DNSSEC) can help prevent that type of attack by adding a chain of trust to DNS queries. When you enable DNSSEC for your site, you can ensure that the DNS response your users receive is the authentic address of your site.
We launched support for DNSSEC in 2014. We made it free for all users, but we couldn’t make it easy to set up. Turning on DNSSEC for a domain was still a multistep, manual process. With the launch of Cloudflare Registrar, we can finish the work to make it simple to enable for your domain.
You can now enable DNSSEC with a single click if your domain is registered with Cloudflare Registrar. Visit the DNS tab in the Cloudflare dashboard, click "Enable DNSSEC", and we'll handle the rest. If you are Continue reading
Serverless technology is still in its infancy, and some people are unsure about where it’s headed. Join Zack Bloom, Director of Product for Product Strategy at Cloudflare, on a journey to explore the serverless future where developers “just write code,” pay for exactly what they use, and completely forget about where code runs; then see why current platforms won't be able to get developers all the way there.
The talk below was originally presented and recorded at Serverless Computing London in November 2018. If you’d like to join us in person to talk about serverless, we’ll be announcing 2019 event locations throughout the year on the docs page.
Many of the technical challenges of serverless (cold-start time, memory overhead, and CPU context switching) are solved by a new architecture which translates technology developed for web browsers onto the server. Learn about how serverless platforms built using isolates are helping to expand the kinds of applications built using serverless.
Zack Bloom helps build the future of the Internet as the Director of Product for Product Strategy at Cloudflare. He was a co-founder of Eager, an Continue reading
This is a guest post by Ben Chartrand, who is a Development Manager at Timely. You can check out some of Ben's other Workers projects on his GitHub and his blog.
At Timely we started a project to migrate our web applications from legacy Azure services to a modern PaaS offering. In theory it meant no code changes.
We decided to start with our webhooks. All our endpoints can be grouped into four categories:
Despite their limited number, these are vitally important. We did a lot of testing but it was clear we’d only really know if everything was working once we had production traffic. How could we migrate traffic?
Change the CNAME to point to the new hosting infrastructure. This is high risk. DNS takes time to propagate so, if we needed to roll back, it would take time. We would also be shifting over everything at once.
Use a traffic manager to shift a percentage of traffic using Cloudflare Load Balancing. We could start at, say, 5% traffic to the new infrastructure Continue reading
My curiosity was piqued by an LWN article about IOCB_CMD_POLL - A new kernel polling interface. It discusses an addition of a new polling mechanism to Linux AIO API, which was merged in 4.18 kernel. The whole idea is rather intriguing. The author of the patch is proposing to use the Linux AIO API with things like network sockets.
Hold on. The Linux AIO is designed for, well, Asynchronous disk IO! Disk files are not the same thing as network sockets! Is it even possible to use the Linux AIO API with network sockets in the first place?
The answer turns out to be a strong YES! In this article I'll explain how to use the strengths of Linux AIO API to write better and faster network servers.
But before we start, what is Linux AIO anyway?
Linux AIO exposes asynchronous disk IO to userspace software.
Historically on Linux, all disk operations were blocking. Whether you did open()
, read()
, write()
or fsync()
, you could be sure your thread would stall if the needed data and meta-data was not ready in disk cache. This usually isn't Continue reading
À peine la page du calendrier tournée que nous constatons plus de troubles sur Internet.
Aujourd’hui, Cloudflare peut confirmer, chiffres à l’appui, qu’Internet a été coupé en République Démocratique du Congo, information précédemment révélée par de multiples organes de presse. Cette coupure a eu lieu alors que se déroulait l’élection présidentielle le 30 Décembre dernier, et perdure pendant la publication des résultats.
Tristement, cette situation est loin d’être une nouveauté. Nous avons fait état d'événements similaires par le passé, y compris lors d’une autre coupure d’Internet en RDC il y a moins d’un an. Une courbe malheureusement bien familière est aujourd’hui visible sur notre plateforme de gestion du réseau, montrant que le trafic dans le pays atteint péniblement un quart de son niveau habituel.
Notez que le diagramme est gradué en temps UTC, et que la capitale de la RDC Kinshasa est dans le fuseau horaire GMT+1.
La chute du trafic a démarré en milieu de journée le 31 Décembre 2018 (à environ 10h30 UTC, soit 11h30 heure locale à Kinshasa). Celà est d’autant plus frappant quand sont superposées toutes les courbes quotidiennes:
Ci-dessus, la courbe rouge représente le trafic du 31 Décembre, et les courbes grises celui des 8 Continue reading
The calendar has barely flipped to 2019 and already we’re seeing Internet disruptions.
Today, Cloudflare can quantitatively confirm that Internet access has been shut down in the Democratic Republic of the Congo, information already reported by many press organisations. This shutdown occurred as the presidential election was taking place on December the 30th, and continues as the results are published.
Sadly, this act is far from unprecedented. We have published many posts about events like this in the past, including a different post about roughly three days of Internet disruption in the Democratic Republic of the Congo less than a year ago. A painfully familiar shape can be seen on our network monitoring platform, showing that the traffic in the country is barely reaching a quarter of its typical level:
Note that the graph is based on UTC and Democratic Republic of the Congo’s capital Kinshasa has the timezone of GMT+1.
The drop in bandwidth started just before midday on 31 December 2018 (around 10:30 UTC, 11:30 local time in Kinshasa). This can be clearly seen if we overlay each 24 hour day over each other:
The red line is 31 December, the gray lines the previous eight days. Looking Continue reading
At Cloudflare, we are constantly looking into ways to improve development experience for Workers and make it the most convenient platform for writing serverless code.
As some of you might have already noticed either from our public release notes, on or in your Cloudflare Workers dashboard, there recently was a small but important change in the look of the inspector.
But before we go into figuring out what it is, let's take a look at our standard example on
The example worker code featured here acts as a transparent proxy, while printing requests / responses to the console.
Commonly, when debugging Workers, all you could see from the client-side devtools is the interaction between your browser and the Cloudflare Worker runtime. However, like in most other server-side runtimes, the interaction between your code and the actual origin has been hidden.
This is where console.log
comes in. Although not the most convenient, printing random things out is a fairly popular debugging technique.
Unfortunately, its default output doesn't help much with debugging network interactions. If you try to expand either of request or response objects, all you can see is just a bunch of lazy accessors: