It’s been one year since I joined Cloudflare as Head of Australia and New Zealand. While it has been a great year for our ANZ operations, it is hard to stop thinking about the elephant in the room, especially as I’m writing this blog from my home in the middle of Melbourne’s lockdown.
The pandemic has not only disrupted our daily lives, but has also caused a massive shift to remote work for many of us. As a result, security teams lost visibility into office network traffic, their employees moved to unsupervised WiFi networks with new video conferencing technology, and their IT teams found that their out-dated VPN platforms could not handle all the traffic of remote employees. While many organisations were already moving to cloud-based applications, this year has exacerbated the need for greater security posture. Our team has been even more humbled by our mission to help build a better Internet and help organisations face the increased security threats COVID-19 has triggered. With that in mind, I’d like to take a look back at the milestones of the past year.
First, I’d like to recognise how strong and resilient our people have been in the past year. It Continue reading
At Cloudflare, we’re constantly working on improving the performance of our edge — and that was exactly what my internship this summer entailed. I’m excited to share some improvements we’ve made to our popular Firewall Rules product over the past few months.
Firewall Rules lets customers filter the traffic hitting their site. It’s built using our engine, Wirefilter, which takes powerful boolean expressions written by customers and matches incoming requests against them. Customers can then choose how to respond to traffic which matches these rules. We will discuss some in-depth optimizations we have recently made to Wirefilter, so you may wish to get familiar with how it works if you haven’t already.
As a new member of the Firewall team, I quickly learned that performance is important — even in our security products. We look for opportunities to make our customers’ Internet properties faster where it’s safe to do so, maximizing both security and performance.
Our engine is already heavily used, powering all of Firewall Rules. But we have bigger plans. More and more products like our Web Application Firewall (WAF) will be running behind our Wirefilter-based engine, and it will become responsible for eating up a Continue reading
Cloudflare’s globally distributed network is not just designed to protect HTTP services but any kind of TCP or UDP traffic that passes through our edge. To this end, we’ve built a number of sophisticated DDoS mitigation systems, such as Gatebot, which analyze world-wide traffic patterns. However, we’ve always employed defense-in-depth: in addition to global protection systems we also use off-the shelf mechanisms such as TCP SYN-cookies, which protect individual servers locally from the very common SYN-flood. But there’s a catch: such a mechanism does not exist for UDP. UDP is a connectionless protocol and does not have similar context around packets, especially considering that Cloudflare powers services such as Spectrum which are agnostic to the upper layer protocol (DNS, NTP, …), so my 2020 intern class project was to come up with a different approach.
First of all, let's discuss what it actually means to provide protection to UDP services. We want to ensure that an attacker cannot drown out legitimate traffic. To achieve this we want to identify floods and limit them while leaving legitimate traffic untouched.
The idea to mitigate such attacks is straight forward: first identify a group of packets that is Continue reading
Every day, all across the Internet, something bad but entirely normal happens: thousands of origin servers go down, resulting in connection errors and frustrated users. Cloudflare’s users collectively spend over four and a half years each day waiting for unreachable origin servers to respond with error messages. But visitors don’t want to see error pages, they want to see content!
Today is exciting for all those who want the Internet to be stronger, more resilient, and have important redundancies: Cloudflare is pleased to announce a partnership with the Internet Archive to bring new functionality to our Always Online service.
Always Online serves as insurance for our customers’ websites. Should a customer’s origin go offline, timeout, or otherwise break, Always Online is there to step in and serve archived copies of webpages to visitors. The Internet Archive is a nonprofit organization that runs the Wayback Machine, a service which saves snapshots of billions of websites across the Internet. By partnering with the Internet Archive, Cloudflare is able to seamlessly deliver responses for unreachable websites from the Internet Archive, while the Internet Archive can continue their mission of archiving the web to provide access to all knowledge.
Enabling Always Online in the Continue reading
On July 3, Cloudflare’s global DDoS protection system, Gatebot, automatically detected and mitigated a UDP-based DDoS attack that peaked at 654 Gbps. The attack was part of a ten-day multi-vector DDoS campaign targeting a Magic Transit customer and was mitigated without any human intervention. The DDoS campaign is believed to have been generated by Moobot, a Mirai-based botnet. No downtime, service degradation, or false positives were reported by the customer.
Over those ten days, our systems automatically detected and mitigated over 5,000 DDoS attacks against this one customer, mainly UDP floods, SYN floods, ACK floods, and GRE floods. The largest DDoS attack was a UDP flood and lasted a mere 2 minutes. This attack targeted only one IP address but hit multiple ports. The attack originated from 18,705 unique IP addresses, each believed to be a Moobot-infected IoT device.
The attack was observed in Cloudflare’s data centers in 100 countries around the world. Approximately 89% of the attack traffic originated from just 10 countries with the US leading at 41%, followed by South Korea and Japan in second place (12% each), Continue reading
If you already understand how Secondary DNS works, please feel free to skip this section. It does not provide any Cloudflare-specific information.
Secondary DNS has many use cases across the Internet; however, traditionally, it was used as a synchronized backup for when the primary DNS server was unable to respond to queries. A more modern approach involves focusing on redundancy across many different nameservers, which in many cases broadcast the same anycasted IP address.
Secondary DNS involves the unidirectional transfer of DNS zones from the primary to the Secondary DNS server(s). One primary can have any number of Secondary DNS servers that it must communicate with in order to keep track of any zone updates. A zone update is considered a change in the contents of a zone, which ultimately leads to a Start of Authority (SOA) serial number increase. The zone’s SOA serial is one of the key elements of Secondary DNS; it is how primary and secondary servers synchronize zones. Below is an example of what an SOA record might look like during a dig query.
example.com 3600 IN SOA ashley.ns.cloudflare.com. dns.cloudflare.com.
2034097105 // Serial
10000 // Continue reading
In March, governments all over the world issued stay-at-home orders, causing a mass migration to teleworking. Alongside many of our partners, Cloudflare launched free products and services supported by onboarding sessions to help our clients secure and accelerate their remote work environments. Over the past few months, a dedicated team of specialists met with hundreds of organizations - from tiny startups, to massive corporations - to help them extend better security and performance to a suddenly-remote workforce.
Most companies we heard from had a VPN in place, but it wasn’t set up to accommodate a full-on remote work environment. When employees began working from home, they found that the VPN was getting overloaded with requests, causing performance lags.
While many organizations had bought more VPN licenses to allow employees to connect to their tools, they found that just having licenses wasn’t enough: they needed to reduce the amount of traffic flowing through their VPN by taking select applications off of the private network.
My name is Dina and I am a Customer Success Manager (CSM) in our San Francisco office. I am responsible for ensuring the success of Cloudflare’s Enterprise customers and managing all of Continue reading
Since the launch of Cloudflare Stream, our customers have been asking for a programmatic way to add watermarks to their videos. We built the Watermarks API to support a wide range of use cases: from customers who simply want to tell Stream “can you put this watermark image to the top right of my video?” to customers with more detailed asks such as “can you put this watermark image in a way it doesn’t take up more than 10% of the original video and with 20% opacity?” All that and more is now available at no additional cost through the Watermarks API.
Cloudflare Stream provides out-of-the-box video infrastructure so developers can bring their app ideas to market faster. While building a video streaming app, developers must ask themselves questions like
Cloudflare Stream is a single product that handles video encoding, storage, delivery and presentation (with the Stream Player.) Stream lets developers launch their ideas Continue reading
Cloudflare powers cdnjs, an open-source project that accelerates websites by delivering popular JavaScript libraries and resources via Cloudflare’s network. Since our major update in December, we focused on remodelling cdnjs for scalability and resilience. Today, we are excited to announce how Cloudflare delivers cdnjs—a migration to a serverless infrastructure using Cloudflare Workers and its distributed key-value store Workers KV!
For those unfamiliar, cdnjs is an acronym describing a Content Delivery Network (CDN) for JavaScript (JS). A CDN simply refers to a geographically distributed network of servers that provide Internet content, whether it is memes, cat videos, or HTML pages. In our case, the CDN refers to Cloudflare’s ever expanding network of over 200 globally distributed data centers.
And here’s why this is relevant to you: it makes page load times lightning-fast. Virtually every website you visit needs to fetch JS libraries in order to load, including this one. Let’s say you visit a Sydney-based website that contains a local file from jQuery, a popular library found in 76.2% of websites. If you are located in New York, you may notice a delay, as it can easily exceed 300ms to fetch the file—not to mention Continue reading
As the scale of Cloudflare’s edge network has grown, we sometimes reach the limits of parts of our architecture. About two years ago we realized that our existing solution for spreading load within our data centers could no longer meet our needs. We embarked on a project to deploy a Layer 4 Load Balancer, internally called Unimog, to improve the reliability and operational efficiency of our edge network. Unimog has now been deployed in production for over a year.
This post explains the problems Unimog solves and how it works. Unimog builds on techniques used in other Layer 4 Load Balancers, but there are many details of its implementation that are tailored to the needs of our edge network.
Cloudflare operates an anycast network, meaning that our data centers in 200+ cities around the world serve the same IP addresses. For example, our own cloudflare.com website uses Cloudflare services, and one of its IP addresses is 104.17.175.85. All of our data centers will accept connections to that address and respond to HTTP requests. By the magic of Internet routing, when you visit cloudflare.com and your Continue reading
The following is a guest post from Josh Larson, Engineer at Vox Media.
Imagine you’re the maintainer of a high-traffic media website, and your DNS is already hosted on Cloudflare.
Page speed is critical. You need to get content to your audience as quickly as possible on every device. You also need to render ads in a speedy way to maintain a good user experience and make money to support your journalism.
One solution would be to render your site statically and cache it at the edge. This would help ensure you have top-notch delivery speed because you don’t need a server to return a response. However, your site has decades worth of content. If you wanted to make even a small change to the site design, you would need to regenerate every single page during your next deploy. This would take ages.
Another issue is that your site would be static — and future updates to content or new articles would not be available until you deploy again.
That’s not going to work.
Another solution would be to render each page dynamically on your server. This ensures you can return a dynamic response for new or updated articles.
Your team members are probably not just working from home - they may be working from different regions or countries. The flexibility of remote work gives employees a chance to work from the towns where they grew up or countries they always wanted to visit. However, that distribution also presents compliance challenges.
Depending on your industry, keeping data inside of certain regions can be a compliance or regulatory requirement. You might require employees to connect from certain countries or exclude entire countries altogether from your corporate systems.
When we worked in physical offices, keeping data inside of a country was easy. All of your users connecting to an application from that office were, of course, in that country. Remote work changed that and teams had to scramble to find a way to keep people productive from anywhere, which often led to sacrifices in terms of compliance. Starting today, you can make geography-based compliance easy again in Cloudflare Access with just two clicks.
You can now build rules that require employees to connect from certain countries. You can also add rules that block team members from connecting from other countries. This feature works with any identity provider configured and requires no Continue reading
Today CenturyLink/Level(3), a major ISP and Internet bandwidth provider, experienced a significant outage that impacted some of Cloudflare’s customers as well as a significant number of other services and providers across the Internet. While we’re waiting for a post mortem from CenturyLink/Level(3), I wanted to write up the timeline of what we saw, how Cloudflare’s systems routed around the problem, why some of our customers were still impacted in spite of our mitigations, and what appears to be the likely root cause of the issue.
At 10:03 UTC our monitoring systems started to observe an increased number of errors reaching our customers’ origin servers. These show up as “522 Errors” and indicate that there is an issue connecting from Cloudflare’s network to wherever our customers’ applications are hosted.
Cloudflare is connected to CenturyLink/Level(3) among a large and diverse set of network providers. When we see an increase in errors from one network provider, our systems automatically attempt to reach customers’ applications across alternative providers. Given the number of providers we have access to, we are generally able to continue to route traffic even when one provider has an issue.
Last year, we launched HTMLRewriter for Cloudflare Workers, which enables developers to make streaming changes to HTML on the edge. Unlike a traditional DOM parser that loads the entire HTML document into memory, we developed a streaming parser written in Rust. Today, we’re announcing support for asynchronous handlers in HTMLRewriter. Now you can perform asynchronous tasks based on the content of the HTML document: from prefetching fonts and image assets to fetching user-specific content from a CMS.
We designed HTMLRewriter to have a jQuery-like experience. First, you define a handler, then you assign it to a CSS selector; Workers does the rest for you. You can look at our new and improved documentation to see our supported list of selectors, which now include nth-child
selectors. The example below changes the alternative text for every second image in a document.
async function editHtml(request) {
return new HTMLRewriter()
.on("img:nth-child(2)", new ElementHandler())
.transform(await fetch(request))
}
class ElementHandler {
element(e) {
e.setAttribute("alt", "A very interesting image")
}
}
Since these changes are applied using streams, we maintain a low TTFB (time to first byte) and users never know the HTML was transformed. If you’re interested in how we’re Continue reading
Whether you are managing a fleet of machines or sharing a private site from your localhost, Argo Tunnel is here to help. On the Argo Tunnel team we help make origins accessible from the Internet in a secure and seamless manner. We also care deeply about productivity and developer experience for the team, so naturally we want to make sure we have a development environment that is reliable, easy to set up and fast to iterate on.
When our development team was still small, we used a docker-compose file to orchestrate the services needed to develop Argo Tunnel. There was no native support for hot reload, so every time an engineer made a change, they had to restart their dev-stack.
We could hack around it to hot reload with docker-compose, but when that failed, we had to waste time debugging the internals of Docker. As the team grew, we realized we needed to invest in improving our dev stack.
At the same time Cloudflare was in the process of migrating from Marathon to kubernetes (k8s). We set out to find a tool that could detect changes in source code and Continue reading
Consistent with our mission to help build a better Internet, Cloudflare has long recognized the importance of conducting our business in a way that respects the rights of Internet users around the world. We provide free services to important voices online - from human rights activists to independent journalists to the entities that help maintain our democracies - who would otherwise be vulnerable to cyberattack. We work hard to develop internal mechanisms and build products that empower user privacy. And we believe that being transparent about the types of requests we receive from government entities and how we respond is critical to maintaining customer trust.
As Cloudflare continues to expand our global network, we think there is more we can do to formalize our commitment to help respect human rights online. To that end, we are excited to announce that we have joined the Global Network Initiative (GNI), one of the world's leading human rights organizations in the information and communications Technology (ICT) sector, as observers.
Understanding Cloudflare’s new partnership with GNI requires some additional background on how human rights concepts apply to businesses.
In 1945, following the end of World War II, 850 delegates Continue reading
Cloudflare Workers — our serverless platform — allows developers around the world to run their applications from our network of 200 datacenters, as close as possible to their users.
A few weeks ago we announced a release candidate for wrangler dev
— today, we're excited to take wrangler dev
, the world’s first edge-based development environment, to GA with the release of wrangler 1.11.
It was once assumed that to successfully run an application on the web, one had to go and acquire a server, set it up (in a data center that hopefully you had access to), and then maintain it on an ongoing basis. Luckily for most of us, that assumption was challenged with the emergence of the cloud. The cloud was always assumed to be centralized — large data centers in a single region (“us-east-1”), reserved for compute. The edge? That was for caching static content.
Again, assumptions are being challenged.
Cloudflare Workers is about moving compute from a centralized location to the edge. And it makes sense: if users are distributed all over the globe, why should all of them be routed to us-east-1, on the opposite side of the world, Continue reading
Today I’m excited to announce wrangler login
, an easy way to get started with Wrangler! This summer for my internship on the Workers Developer Productivity team I was tasked with helping improve the Wrangler user experience. For those who don’t know, Workers is Cloudflare’s serverless platform which allows users to deploy their software directly to Cloudflare’s edge network.
This means you can write any behaviour on requests heading to your site or even run fully fledged applications directly on the edge. Wrangler is the open-source CLI tool used to manage your Workers and has a big focus on enabling a smooth developer experience.
When I first heard I was working on Wrangler, I was excited that I would be working on such a cool product but also a little nervous. This was the first time I would be writing Rust in a professional environment, the first time making meaningful open-source contributions, and on top of that the first time doing all of this remotely. But thanks to lots of guidance and support from my mentor and team, I was able to help make the Wrangler and Workers developer experience just a little bit better.
The main improvement Continue reading
Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.
We launched speed.cloudflare.com in May 2020 to give our users insight into how well their networks perform. The test provides download, upload and latency tests. Soon after release, we received reports from a small number of users that sometimes upload speeds were underreported. Our investigation determined that it seemed to happen with end users that had high upload bandwidth available (several hundreds Mbps class cable modem or fiber service). Our speed tests are performed via browser JavaScript, and most browsers use HTTP/2 by default. We found that HTTP/2 upload speeds were sometimes much slower than HTTP/1.1 (assuming all TLS) when the user had high available upload bandwidth.
Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), Continue reading
Cloudflare extensively uses its own products internally in a process known as dogfooding. As part of my onboarding as an intern on the Spectrum (a layer 4 reverse proxy) team, I learned that many internal services dogfood Spectrum, as they are exposed to the Internet and benefit from layer 4 DDoS protection. One of my first tasks was to update the configuration for an internal service that was using Spectrum. The configuration was managed in Salt (used for configuration management at Cloudflare), which was not particularly user-friendly, and required an engineer on the Spectrum team to handle updating it manually.
This process took about a week. That should instantly raise some questions, as a typical Spectrum customer can create a new Spectrum app in under a minute through Cloudflare Dashboard. So why couldn’t I?
This question formed the basis of my intern project for the summer.
Cloudflare uses various IP ranges for its products. Some customers also authorize Cloudflare to announce their IP prefixes on their behalf (this is known as BYOIP). Collectively, we can refer to these IPs as managed addresses. To prevent Bad Stuff (defined later) from happening, we prohibit managed addresses from Continue reading