We recently updated developers.cloudflare.com
, the Cloudflare Developers documentation website, to a new version of our custom documentation engine. This change consisted of a significant migration from Gatsby to Hugo and converged a collection of Workers Sites into a single Cloudflare Pages instance. Together, these updates brought developer experience, performance, and quality of life improvements for our engineers, technical writers, and product managers.
In this blog post, we’ll cover the history of Cloudflare’s developer docs, why we made this recent transition, and why we continue to dogfood Cloudflare’s products as we develop applications internally.
Cloudflare’s Developer Docs, which are open source on GitHub, comprise documentation for all of Cloudflare’s products. The documentation is written by technical writers, product managers, and engineers at Cloudflare. Like many open source projects, contributions to the docs happen via Pull Requests (PRs). At time of writing, we have 1,600 documentation pages and have accepted almost 4,000 PRs, both from Cloudflare employees and external contributors in our community.
The underlying documentation engine we’ve used to build these docs has changed multiple times over the years. Documentation sites are often built with static site generators and, at Cloudflare, we’ve Continue reading
In recent years, management interfaces on servers like a Baseboard Management Controller (BMC) have been the target of cyber attacks including ransomware, implants, and disruptive operations. Common BMC vulnerabilities like Pantsdown and USBAnywhere, combined with infrequent firmware updates, have left servers vulnerable.
We were recently informed from a trusted vendor of new, critical vulnerabilities in popular BMC software that we use in our fleet. Below is a summary of what was discovered, how we mitigated the impact, and how we look to prevent these types of vulnerabilities from having an impact on Cloudflare and our customers.
A baseboard management controller is a small, specialized processor used for remote monitoring and management of a host system. This processor has multiple connections to the host system, giving it the ability to monitor hardware, update BIOS firmware, power cycle the host, and many more things.
Access to the BMC can be local or, in some cases, remote. With remote vectors open, there is potential for malware to be installed on the BMC from the local host via PCI Express or the Low Pin Count (LPC) interface. With compromised software on the BMC, malware or spyware could maintain persistence on the server.
At Cloudflare, we talk a lot about how to help build a better Internet. On the Product Content Experience (PCX) team, we treat content like a product that represents and fulfills this mission. Our vision is to create world-class content that anticipates user needs and helps build accessible Cloudflare products. We believe we can impact the Cloudflare product experience and make it as wonderful as possible by intentionally designing, packaging, and testing the content.
I like taking on projects. A singular goal is met, and I clearly know I’m successful because the meaning of “done” is normally very clear. For example, I volunteer some of my time editing academic papers about technology. My role as an editor is temporary and there is a defined beginning and end to the work. I send my feedback and my task is largely complete.
“Content like a product” is when you shift your mindset from completing projects to maintaining a product, taking into consideration the user and their feedback. Product content at Cloudflare is an iterative, living, breathing thing. Inspired by the success of teams that adopt an agile mindset, along with some strategic functions you might find Continue reading
It can be frustrating to get errors (SERVFAIL response codes) returned from your DNS queries. It can be even more frustrating if you don’t get enough information to understand why the error is occurring or what to do next. That’s why back in 2020, we launched support for Extended DNS Error (EDE) Codes to 1.1.1.1.
As a quick refresher, EDE codes are a proposed IETF standard enabled by the Extension Mechanisms for DNS (EDNS) spec. The codes return extra information about DNS or DNSSEC issues without touching the RCODE so that debugging is easier.
Now we’re happy to announce we will return more error code types and include additional helpful information to further improve your debugging experience. Let’s run through some examples of how these error codes can help you better understand the issues you may face.
To try for yourself, you’ll need to run the dig or kdig command in the terminal. For dig, please ensure you have v9.11.20 or above. If you are on macOS 12.1, by default you only have dig 9.10.6. Install an updated version of BIND to fix that.
Let’s start with the output of an example Continue reading
This post is also available in 简体中文, 日本語, Español.
Since my previous blog about Secondary DNS, Cloudflare's DNS traffic has more than doubled from 15.8 trillion DNS queries per month to 38.7 trillion. Our network now spans over 270 cities in over 100 countries, interconnecting with more than 10,000 networks globally. According to w3 stats, “Cloudflare is used as a DNS server provider by 15.3% of all the websites.” This means we have an enormous responsibility to serve DNS in the fastest and most reliable way possible.
Although the response time we have on DNS queries is the most important performance metric, there is another metric that sometimes goes unnoticed. DNS Record Propagation time is how long it takes changes submitted to our API to be reflected in our DNS query responses. Every millisecond counts here as it allows customers to quickly change configuration, making their systems much more agile. Although our DNS propagation pipeline was already known to be very fast, we had identified several improvements that, if implemented, would massively improve performance. In this blog post I’ll explain how we managed to drastically improve our DNS record propagation speed, and the Continue reading
This post is also available in عربي.
I am excited to announce that I have joined Cloudflare as Managing Director for the Middle East and Turkey (MET) region. Having worked in the domain of cyber security for more than two decades, I can see that Cloudflare is genuine in its mission of building a better Internet that is fast, safe and reliable for everyone. Being part of this journey that touches everyone’s life is surely an exciting thing to do, and I look forward to putting my experience in play towards successfully achieving this goal.
Cloudflare has been associated with delivering fast content over cloud in a most reliable and secure manner, accounting for at least 20% of the global Internet traffic. Cloudflare can cater for and support all types of organizations (businesses and public sector) including those with a social mission. The Middle East and Turkey as an emerging market is characterized by a relatively young population, with 70% of it being under the age of 30. This dynamic youth segment has an insatiable demand for both content and knowledge. To that extent, there has been a rapid uptake in Internet use, and digital transformation initiatives have significantly accelerated Continue reading
This post is also available in French, German and Spanish.
Back in the early days of the Internet, you could physically see the hardware where your data was stored. You knew where your data was and what kind of locks and security protections you had in place. Fast-forward a few decades, and data is all “in the cloud”. Now, you have to trust that your cloud services provider is putting security precautions in place just as you would have if your data was still sitting on your hardware. The good news is, you don’t have to merely trust your provider anymore. There are a number of ways a cloud services provider can prove it has robust privacy and security protections in place.
Today, we are excited to announce that Cloudflare has taken three major steps forward in proving the security and privacy protections we provide to customers of our cloud services: we achieved a key cloud services certification, ISO/IEC 27018:2019; we completed our independent audit and received our Cloud Computing Compliance Criteria Catalog (“C5”) attestation; and we have joined the EU Cloud Code of Conduct General Assembly to help increase the impact of the trusted cloud ecosystem and encourage Continue reading
I joined Cloudflare in March to lead Partnerships & Alliances for Asia Pacific, Japan, and China (APJC). In the last month I’ve been asked many times: “Why Cloudflare?” I’ll be honest, I’ve had opportunities to join other technology companies, but no other organization excited me more than Cloudflare. So I jumped. And I couldn’t be more thrilled for the opportunity to build a strong partner ecosystem for APJC.
When I considered joining Cloudflare, I recall consistently reading the message around “Helping to Build a Better Internet”. At first those words didn’t connect with me, but they sounded like an important mission.
I did my research and read analyst reports to learn about Cloudflare's market position, and then it dawned on me, Cloudflare is leading a transformation. Taking traditional on-premise networking and security hardware and building a transformational cloud-based solution, so customers don’t need to worry about which company supplied their kit. I was excited to learn that Cloudflare customers can simply access the vast global network that has been designed to make everything that customers connect to on the Internet secure, private, fast, and reliable. So hasn’t this been done before? For compute and storage that transformation is almost Continue reading
We use Prometheus as our core monitoring system. We’ve been heavy Prometheus users since 2017 when we migrated off our previous monitoring system which used a customized Nagios setup. Despite growing our infrastructure a lot, adding tons of new products and learning some hard lessons about operating Prometheus at scale, our original architecture of Prometheus (see Monitoring Cloudflare's Planet-Scale Edge Network with Prometheus for an in depth walk through) remains virtually unchanged, proving that Prometheus is a solid foundation for building observability into your services.
One of the key responsibilities of Prometheus is to alert us when something goes wrong and in this blog post we’ll talk about how we make those alerts more reliable - and we’ll introduce an open source tool we’ve developed to help us with that, and share how you can use it too. If you’re not familiar with Prometheus you might want to start by watching this video to better understand the topic we’ll be covering here.
Prometheus works by collecting metrics from our services and storing those metrics inside its database, called TSDB. We can then query these metrics using Prometheus query language called PromQL using ad-hoc queries (for example to power Grafana Continue reading
There’s only one song contest that is more than six decades old and not only presents many new songs (ABBA, Celine Dion, Julio Iglesias and Domenico Modugno shined there), but also has a global stage that involves 40 countries — performers represent those countries and the public votes. The 66th edition of the Eurovision Song Contest, in Turin, Italy, had two semi-finals (May 10 and 12) and a final (May 14), all of them with highlights, including Ukraine’s victory. The Internet was impacted in more than one way, from whole countries to the fan and official broadcasters sites, but also video platforms.
On our Eurovision dedicated page, it was possible to see the level of Internet traffic in the 40 participant countries, and we tweeted some highlights during the final.
#Ukraine just won the #Eurovision in Turin, #Italy
— Cloudflare Radar (@CloudflareRadar) May 14, 2022
Video platforms DNS traffic in Ukraine today, during the event, was 22% higher at 23:00 CEST compared to the previous Saturday. The @Eurovision final is being transmitted live via YouTube.
— @Cloudflare data. pic.twitter.com/juBmtDj1FP
First, some technicalities. The baseline for the values we use in the following charts Continue reading
Before we dive into my experience interning at Cloudflare, let me quickly introduce myself. I am currently a master’s student at the National University of Singapore (NUS) studying Computer Science. I am passionate about building software that improves people’s lives and making the Internet a better place for everyone. Back in December 2021, I joined Cloudflare as a Software Development Intern on the Partnerships team to help improve the experience that Partners have when using the platform. I was extremely excited about this opportunity and jumped at the prospect of working on serverless technology to build viable tools for our partners and customers. In this blog post, I detail my experience working at Cloudflare and the many highlights of my internship.
The process began for me back when I was taking a software engineering module at NUS where one of my classmates had shared a job post for an internship at Cloudflare. I had known about Cloudflare’s DNS service prior and was really excited to learn more about the internship opportunity because I really resonated with the company's mission to help build a better Internet.
I knew right away that this would be a great opportunity and submitted Continue reading
We’re excited to announce the availability of Network Analytics Logs. Magic Transit, Magic Firewall, Magic WAN, and Spectrum customers on the Enterprise plan can feed packet samples directly into storage services, network monitoring tools such as Kentik, or their Security Information Event Management (SIEM) systems such as Splunk to gain near real-time visibility into network traffic and DDoS attacks.
By creating a Network Analytics Logs job, Cloudflare will continuously push logs of packet samples directly to the HTTP endpoint of your choice, including Websockets. The logs arrive in JSON format which makes them easy to parse, transform, and aggregate. The logs include packet samples of traffic dropped and passed by the following systems:
Note that not all mitigation systems are applicable to all Cloudflare services. Below is a table describing which mitigation service is applicable to which Cloudflare service:
Mitigation System |
Cloudflare Service | ||
---|---|---|---|
Magic Transit | Magic WAN | Spectrum | |
Network-layer DDoS Protection Ruleset | ✅ | ❌ | ✅ |
Advanced TCP Protection | ✅ | ❌ | ❌ |
Magic Firewall | Continue reading |
In Cloudflare’s global network, every server runs the whole software stack. Therefore, it's critical that every server performs to its maximum potential capacity. In order to provide us better flexibility from a supply chain perspective, we buy server hardware from multiple vendors with the exact same configuration. However, after the deployment of our Gen X AMD EPYC Zen 2 (Rome) servers, we noticed that servers from one vendor (which we’ll call SKU-B) were consistently performing 5-10% worse than servers from second vendor (which we'll call SKU-A).
The graph below shows the performance discrepancy between the two SKUs in terms of percentage difference. The performance is gauged on the metric of requests per second, and this data is an average of observations captured over 24 hours.
The initial debugging efforts centered around the compute performance. We ran AMD’s DGEMM high performance computing tool to determine if CPU performance was the cause. DGEMM is designed to measure the sustained floating-point computation rate of a single server. Specifically, the code measures the floating point rate of execution of a real matrix–matrix multiplication with double Continue reading
After many announcements from Platform Week, we’re thrilled to make one more: our Spring Developer Challenge!
The theme for this challenge is building real-time, collaborative applications — one of the most exciting use-cases emerging in the Cloudflare ecosystem. This is an opportunity for developers to merge their ideas with our newly released features, earn recognition on our blog, and take home our best swag yet.
Here’s a list of our tools that will get you started:
In September 2021, we shared extensive benchmarking results of 1,000 networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on different measures (p95, mean), Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance, with the ultimate goal of being the fastest everywhere and an intermediate goal to grow the number of networks where we’re the fastest by at least 10% every Innovation Week. We met that goal during Security Week (March 2022), and we’re carrying the work over to Platform Week (May 2022).
We’re excited to update you on the latest results, but before we do: after running with this benchmark for nine months, we've also been looking for ways to improve the benchmark itself — to make it even more representative of speeds in the real world. To that end, we're expanding our measured networks from 1,000 to 3,000, to give an even more accurate sense of real world performance across the globe.
In terms of results: using the old benchmark of 1,000 networks, we’re the fastest in 69% of them. Continue reading
What happens to the Internet traffic in countries where many observe Ramadan? Depending on the country, there are clear shifts and changing patterns in Internet use, particularly before dawn and after sunset.
This year, Ramadan started on April 2, and it continued until May 1, 2022, (dates vary and are dependent on the appearance of the crescent moon). For Muslims, it is a period of introspection, communal prayer and also of fasting every day from dawn to sunset. That means that people only eat at night (Iftar is the first meal after sunset that breaks the fast and often also a family or community event), and also before sunrise (Suhur).
In some countries, the impact is so big that we can see in our Internet traffic charts when the sun sets. Sunrise is more difficult to check in the charts, but in the countries more impacted, people wake up much earlier than usual and were using the Internet in the early morning because of that.
Cloudflare Radar data shows that Internet traffic was impacted in several countries by Ramadan, with a clear increase in traffic before sunrise, and a bigger than usual decrease after sunset. All times Continue reading
A little under four years ago we announced Cloudflare's first experiments in web3 with our gateway to the InterPlanetary File System (IPFS). Three years ago we announced our experimental Ethereum Gateway. At Cloudflare, we often take experimental bets on the future of the Internet to help new technologies gain maturity and stability and for us to gain deep understanding of them.
Four years after our initial experiments in web3, it’s time to launch our next series of experiments to help advance the state of the art. These experiments are focused on finding new ways to help solve the scale and environmental challenges that face blockchain technologies today. Over the last two years there has been a rapid increase in the use of the underlying technologies that power web3. This growth has been a catalyst for a generation of startups; and yet it has also had negative externalities.
At Cloudflare, we are committed to helping to build a better Internet. That commitment balances technology and impact. The impact of certain older blockchain technologies on the environment has been challenging. The Proof of Work (PoW) consensus systems that secure many blockchains were instrumental in the bootstrapping of the web3 ecosystem. However, these Continue reading
Today we are excited to announce that our Ethereum and IPFS gateways are publicly available to all Cloudflare customers for the first time. Since our announcement of our private beta last September the interest in our Eth and IPFS gateways has been overwhelming. We are humbled by the demand for these tools, and we are excited to get them into as many developers' hands as possible. Starting today, any Cloudflare customer can log into the dashboard and configure a zone for Ethereum, IPFS, or both!
Over the last eight months of the private beta, we’ve been busy working to fully operationalize the gateways to ensure they meet the needs of our customers!
First, we have created a new API with end-to-end managed hostname deployment. This ensures the creation and management of gateways as you continue to scale remains extremely quick and easy! It is paramount to give time and focus back to developers to focus on your core product and services and leave the infrastructural components to us!
Second, we’ve added a brand new UI bringing web3 to Cloudflare's zone-level dashboard. Now, regardless of the workflow you are used to, we have parity between our UI and API to ensure Continue reading
Four years ago, Cloudflare went Interplanetary by offering a gateway to the IPFS network. This meant that if you hosted content on IPFS, we offered to make it available to every user of the Internet through HTTPS and with Cloudflare protection. IPFS allows you to choose a storage provider you are comfortable with, while providing a standard interface for Cloudflare to serve this data.
Since then, businesses have new tools to streamline web development. Cloudflare Workers, Pages, and R2 are enabling developers to bring services online in a matter of minutes, with built-in scaling, security, and analytics.
Today, we're announcing we're bridging the two. We will make it possible for our customers to serve their sites on the IPFS network.
In this post, we'll learn how you will be able to build your website with Cloudflare Pages, and leverage the IPFS integration to make your content accessible and available across multiple providers.
The InterPlanetary FileSystem (IPFS) is a peer-to-peer network for storing content on a distributed file system. It is composed of a set of computers called nodes that store and relay content using a common addressing system. In short, a set of participants Continue reading
We have been operating an IPFS gateway for the last four years. It started as a research experiment in 2018, providing end-to-end integrity with IPFS. A year later, we made IPFS content faster to fetch. Last year, we announced we were committed to making IPFS gateway part of our product offering. Through this process, we needed to inform our design decisions to know how our setup performed.
To this end, we've developed the IPFS Gateway monitor, an observability tool that runs various IPFS scenarios on a given gateway endpoint. In this post, you'll learn how we use this tool and go over discoveries we made along the way.
IPFS is a distributed system for storing and accessing files, websites, applications, and data. It's different from a traditional centralized file system in that IPFS is completely distributed. Any participant can join and leave at any time without the loss of overall performance.
However, in order to access any file in IPFS, users cannot just use web browsers. They need to run an IPFS node to access the file from IPFS using its own protocol. IPFS Gateways play the role of enabling users to do this using only Continue reading