Archive

Category Archives for "CloudFlare"

Moving Quicksilver into production

Moving Quicksilver into production

One of the great arts of software engineering is making updates and improvements to working systems without taking them offline. For some systems this can be rather easy, spin up a new web server or load balancer, redirect traffic and you’re done. For other systems, such as the core distributed data store which keeps millions of websites online, it’s a bit more of a challenge.

Quicksilver is the data store responsible for storing and distributing the billions of KV pairs used to configure the millions of sites and Internet services which use Cloudflare. In a previous post, we discussed why it was built and what it was replacing. Building it, however, was only a small part of the challenge. We needed to deploy it to production into a network which was designed to be fault tolerant and in which downtime was unacceptable.

We needed a way to deploy our new service seamlessly, and to roll back that deploy should something go wrong. Ultimately many, many, things did go wrong, and every bit of failure tolerance put into the system proved to be worth its weight in gold because none of this was visible to customers.

The Bridge

Our goal Continue reading

Getting to the Core: Benchmarking Cloudflare’s Latest Server Hardware

Getting to the Core: Benchmarking Cloudflare’s Latest Server Hardware
Getting to the Core: Benchmarking Cloudflare’s Latest Server Hardware

Maintaining a server fleet the size of Cloudflare’s is an operational challenge, to say the least. Anything we can do to lower complexity and improve efficiency has effects for our SRE (Site Reliability Engineer) and Data Center teams that can be felt throughout a server’s 4+ year lifespan.

At the Cloudflare Core, we process logs to analyze attacks and compute analytics. In 2020, our Core servers were in need of a refresh, so we decided to redesign the hardware to be more in line with our Gen X edge servers. We designed two major server variants for the core. The first is Core Compute 2020, an AMD-based server for analytics and general-purpose compute paired with solid-state storage drives. The second is Core Storage 2020, an Intel-based server with twelve spinning disks to run database workloads.

Core Compute 2020

Earlier this year, we blogged about our 10th generation edge servers or Gen X and the improvements they delivered to our edge in both performance and security. The new Core Compute 2020 server leverages many of our learnings from the edge server. The Core Compute servers run a variety of workloads including Kubernetes, Kafka, and various smaller services.

Configuration Changes (Kubernetes)

Previous Continue reading

Improving Performance and Search Rankings with Cloudflare for Fun and Profit

Improving Performance and Search Rankings with Cloudflare for Fun and Profit

Making things fast is one of the things we do at Cloudflare. More responsive websites, apps, APIs, and networks directly translate into improved conversion and user experience. On November 10th, Google announced that Google Search will directly take web performance and page experience data into account when ranking results on their search engine results pages (SERPs), beginning in May 2021.

Specifically, Google Search will prioritize results based on how pages score on Core Web Vitals, a measurement methodology Cloudflare has worked closely with Google to establish, and we have implemented support for in our analytics tools.

Improving Performance and Search Rankings with Cloudflare for Fun and Profit
Source: "Search Page Experience Graphic" by Google is licensed under CC BY 4.0

The Core Web Vitals metrics are Largest Contentful Paint (LCP, a loading measurement), First Input Delay (FID, a measure of interactivity), and Cumulative Layout Shift (CLS, a measure of visual stability). Each one is directly associated with user perceptible page experience milestones. All three can be improved using our performance products, and all three can be measured with our Cloudflare Browser Insights product, and soon, with our free privacy-aware Cloudflare Web Analytics.

SEO experts have always suspected faster pages lead to better search ranking. With the recent announcement from Continue reading

Many services, one cloudflared

Many services, one cloudflared
Route many different local services through many different URLs, with only one cloudflared
Many services, one cloudflared

I work on the Argo Tunnel team, and we make a program called cloudflared, which lets you securely expose your web service to the Internet while ensuring that all its traffic goes through Cloudflare.

Say you have some local service (a website, an API, a TCP server, etc), and you want to securely expose it to the internet using Argo Tunnel. First, you run cloudflared, which establishes some long-lived TCP connections to the Cloudflare edge. Then, when Cloudflare receives a request for your chosen hostname, it proxies the request through those connections to cloudflared, which in turn proxies the request to your local service. This means anyone accessing your service has to go through Cloudflare, and Cloudflare can do caching, rewrite parts of the page, block attackers, or build Zero Trust rules to control who can reach your application (e.g. users with a @corp.com email). Previously, companies had to use VPNs or firewalls to achieve this, but Argo Tunnel aims to be more flexible, more secure, and more scalable than the alternatives.

Some of our larger customers have deployed hundreds of services with Argo Continue reading

Network-layer DDoS attack trends for Q3 2020

Network-layer DDoS attack trends for Q3 2020
Network-layer DDoS attack trends for Q3 2020

DDoS attacks are surging — both in frequency and sophistication. After doubling from Q1 to Q2, the total number of network layer attacks observed in Q3 doubled again — resulting in a 4x increase in number compared to the pre-COVID levels in the first quarter. Cloudflare also observed more attack vectors deployed than ever — in fact, while SYN, RST, and UDP floods continue to dominate the landscape, we saw an explosion in protocol specific attacks such as mDNS, Memcached, and Jenkins DoS attacks.

Here are other key network layer DDoS trends we observed in Q3:

  • Majority of the attacks are under 500 Mbps and 1 Mpps — both still suffice to cause service disruptions
  • We continue to see a majority of attacks be under 1 hr in duration
  • Ransom-driven DDoS attacks (RDDoS) are on the rise as groups claiming to be Fancy Bear, Cozy Bear and the Lazarus Group extort organizations around the world. As of this writing, the ransom campaign is still ongoing. See a special note on this below.

Number of attacks

The total number of L3/4 DDoS attacks we observe on our network continues to increase substantially, as indicated in the graph below. All in all, Continue reading

Anchoring Trust: A Hardware Secure Boot Story

Anchoring Trust: A Hardware Secure Boot Story
Anchoring Trust: A Hardware Secure Boot Story

As a security company, we pride ourselves on finding innovative ways to protect our platform to, in turn, protect the data of our customers. Part of this approach is implementing progressive methods in protecting our hardware at scale. While we have blogged about how we address security threats from application to memory, the attacks on hardware, as well as firmware, have increased substantially. The data cataloged in the National Vulnerability Database (NVD) has shown the frequency of hardware and firmware-level vulnerabilities rising year after year.

Technologies like secure boot, common in desktops and laptops, have been ported over to the server industry as a method to combat firmware-level attacks and protect a device’s boot integrity. These technologies require that you create a trust ‘anchor’, an authoritative entity for which trust is assumed and not derived. A common trust anchor is the system Basic Input/Output System (BIOS) or the Unified Extensible Firmware Interface (UEFI) firmware.

While this ensures that the device boots only signed firmware and operating system bootloaders, does it protect the entire boot process? What protects the BIOS/UEFI firmware from attacks?

The Boot Process

Before we discuss how we secure our boot process, we will first Continue reading

Workers KV – free to try, with increased limits!

Workers KV - free to try, with increased limits!
Workers KV - free to try, with increased limits!

In May 2019, we launched Workers KV, letting developers store key-value data and make that data globally accessible from Workers running in Cloudflare’s over 200 data centers.

Today, we’re announcing a Free Tier for Workers KV that opens up global, low-latency data storage to every developer on the Workers platform. Additionally, to expand Workers KV’s use cases even further, we’re also raising the maximum value size from 10 MB to 25 MB. You can now write an application that serves larger static files directly or JSON blobs directly from KV.

Together with our announcement of the Durable Objects limited beta last month, the Workers platform continues to move toward providing storage solutions for applications that are globally deployed as easily as an application running in a single data center today.

What are the new free tier limits?

The free tier includes 100,000 read operations and 1,000 each of write, list and delete operations per day, resetting daily at UTC 00:00, with a maximum total storage size of 1 GB. Operations that exceed these limits will fail with an error.

Additional KV usage costs $0.50 per million read operations, $5.00 per million list, write and delete operations Continue reading

When trusted relationships are formed, everyone wins!

When trusted relationships are formed, everyone wins!
When trusted relationships are formed, everyone wins!

Key Points:

  • Customer Success Managers offer continual strategic and technical guidance by way of interactive workshops, account reviews, tuning sessions and regular product updates.
  • Our product development and design teams constantly work on new features and product updates based on your input.
  • It’s a team effort. As part of our Premium Success offering, we can introduce you to Product Managers for in-depth conversations about our solutions and how they can apply to your business goals.
  • Cloudflare is always rapidly evolving and expanding our solutions! As technology advances, so does the sophistication of attacks. Through machine learning and behavioural analysis, we are able to ship new products to ensure you remain secure without impacting performance.

Reach out to your Customer Success Manager to gain more information on how they can accelerate your business.

The Success Story

Hi there. My name is Jake Jones and I’m a Customer Success Manager at Cloudflare covering the Middle East and Africa. When I look at what success means to me, it’s becoming a trusted advisor for my customers by taking a genuine interest in their priorities and helping them reach desired goals. I’ve learnt that successful partnerships are a byproduct of successful relationship building. Every Continue reading

SAD DNS Explained

SAD DNS Explained

This week, at the ACM CCS 2020 conference, researchers from UC Riverside and Tsinghua University announced a new attack against the Domain Name System (DNS) called SAD DNS (Side channel AttackeD DNS). This attack leverages recent features of the networking stack in modern operating systems (like Linux) to allow attackers to revive a classic attack category: DNS cache poisoning. As part of a coordinated disclosure effort earlier this year, the researchers contacted Cloudflare and other major DNS providers and we are happy to announce that 1.1.1.1 Public Resolver is no longer vulnerable to this attack.

In this post, we’ll explain what the vulnerability was, how it relates to previous attacks of this sort, what mitigation measures we have taken to protect our users, and future directions the industry should consider to prevent this class of attacks from being a problem in the future.

DNS Basics

The Domain Name System (DNS) is what allows users of the Internet to get around without memorizing long sequences of numbers. What’s often called the “phonebook of the Internet” is more like a helpful system of translators that take natural language domain names (like blog.cloudflare.com or gov.uk) and Continue reading

Automated Origin CA for Kubernetes

Automated Origin CA for Kubernetes
Automated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you're successful, don't forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we're releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account's domains.

Origin CA Integration

Creating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: cert-manager.k8s.cloudflare.com/v1
kind: OriginIssuer
metadata:
   Continue reading

UK Black History Month at Cloudflare

UK Black History Month at Cloudflare
UK Black History Month at Cloudflare

In February 2019, I started my journey at Cloudflare. Back then, we lived in a COVID-19 free world and I was lucky enough, as part of the employee onboarding program, to visit our San Francisco HQ. As I took my first steps into the office, I was greeted by a beautiful bouquet of Protea flowers at the reception desk. Being from South Africa, seeing our national flower instantly made me feel at home and welcomed to the Cloudflare family - this memory will always be with me.

Later that day, I learnt it was Black History Month in the US. This celebration included African food for lunch, highlights of Black History icons on Cloudflare’s TV screens, and African drummers. At Cloudflare, Black History Month is coordinated and run by Afroflare, one of many Employee Resource Groups (ERGs) that celebrates diversity and inclusion. The excellent delivery of Black History Month demonstrated to me how seriously Cloudflare takes Black History Month and ERGs.

Today, I am one of the Afroflare leads in the London office and led this year’s UK Black History Month celebration. 2020 has been a year of historical events, which made this celebration uniquely significant. George Floyd’s murder Continue reading

My internship: Brotli compression using a reduced dictionary

My internship: Brotli compression using a reduced dictionary

Brotli is a state of the art lossless compression format, supported by all major browsers. It is capable of achieving considerably better compression ratios than the ubiquitous gzip, and is rapidly gaining in popularity. Cloudflare uses the Google brotli library to dynamically compress web content whenever possible. In 2015, we took an in-depth look at how brotli works and its compression advantages.

One of the more interesting features of the brotli file format, in the context of textual web content compression, is the inclusion of a built-in static dictionary. The dictionary is quite large, and in addition to containing various strings in multiple languages, it also supports the option to apply multiple transformations to those words, increasing its versatility.

The open sourced brotli library, that implements an encoder and decoder for brotli, has 11 predefined quality levels for the encoder, with higher quality level demanding more CPU in exchange for a better compression ratio. The static dictionary feature is used to a limited extent starting with level 5, and to the full extent only at levels 10 and 11, due to the high CPU cost of this feature.

We improve on the limited dictionary use approach and add Continue reading

Tech Leaders on the Future of Remote Work

Tech Leaders on the Future of Remote Work

Dozens of top leaders and thinkers from the tech industry and beyond recently joined us for a series of fireside chats commemorating Cloudflare’s 10th birthday. Over the course of 24 hours of conversation, many of these leaders touched on how the workplace has evolved during the pandemic, and how these changes will endure into the future.

Here are some of the highlights.

On the competition for talent

Stewart Butterfield
Co-founder and CEO, Slack

Tech Leaders on the Future of Remote Work

The thing that I think people don't appreciate or realize is that this is not a choice that companies are really going to make on an individual basis. I've heard a lot of leaders say, “we're going back to the office after the summer.”

If we say we require you to be in the office five days a week and, you know, Twitter doesn't, Salesforce doesn't — and those offers are about equal — they'll take those ones. I think we would also lose existing employees if they didn't believe that they had the flexibility. Once you do that, it affects the market for talent. If half of the companies support distributed work or flexible hours and flexible time in the office, you can compensate Continue reading

Bienvenue Cloudflare France! Why I’m helping Cloudflare grow in France

Bienvenue Cloudflare France!
Why I’m helping Cloudflare grow in France

If you'd like to read this post in French click here.

Bienvenue Cloudflare France!
Why I’m helping Cloudflare grow in France

I am incredibly excited to announce that I have joined Cloudflare as its Head of France to help build a better Internet and expand the company’s growing customer base in France. This is an important milestone for Cloudflare as we continue to grow our presence in Europe. Alongside our London, Munich, and Lisbon offices, Paris marks the fourth Cloudflare office in the EMEA region. With this, we’ll be able to further serve our customers’ demand, recruit local talent, and build on the successes we’ve had in our other offices around the globe. I have been impressed by what Cloudflare has built in EMEA including France, and I am even more excited by what lies ahead for our customers, partners, and employees.

Born in Paris and raised in Paris, Normandie and Germany, I started my career more than 20 years ago. While a teenager, I had the chance to work on one of the first Apple IIe’s available in France. I have always had a passion for technology and continue to be amazed by the value of its adoption with businesses large and small. In former roles as Solution Engineer Continue reading

Announcing Spectrum DDoS Analytics and DDoS Insights & Trends

Announcing Spectrum DDoS Analytics and DDoS Insights & Trends
Announcing Spectrum DDoS Analytics and DDoS Insights & Trends

We’re excited to announce the expansion of the Network Analytics dashboard to Spectrum customers on the Enterprise plan. Additionally, this announcement introduces two major dashboard improvements for easier reporting and investigation.

Network Analytics

Cloudflare's packet and bit oriented dashboard, Network Analytics, provides visibility into Internet traffic patterns and DDoS attacks in Layers 3 and 4 of the OSI model. This allows our users to better understand the traffic patterns and DDoS attacks as observed at the Cloudflare edge.

When the dashboard was first released in January, these capabilities were only available to Bring Your Own IP customers on the Spectrum and Magic Transit services, but now Spectrum customers using Cloudflare’s Anycast IPs are also supported.

Protecting L4 applications

Spectrum is Cloudflare’s L4 reverse-proxy service that offers unmetered DDoS protection and traffic acceleration for TCP and UDP applications. It provides enhanced traffic performance through faster TLS, optimized network routing, and high speed interconnection. It also provides encryption to legacy protocols and applications that don’t come with embedded encryption. Customers who typically use Spectrum operate services in which network performance and resilience to DDoS attacks are of utmost importance to their business, such as email, remote access, and gaming.

Spectrum customers Continue reading

Fall 2020 RPKI Update

Fall 2020 RPKI Update

The Internet is a network of networks. In order to find the path between two points and exchange data, the network devices rely on the information from their peers. This information consists of IP addresses and Autonomous Systems (AS) which announce the addresses using Border Gateway Protocol (BGP).

One problem arises from this design: what protects against a malevolent peer who decides to announce incorrect information? The damage caused by route hijacks can be major.

Routing Public Key Infrastructure (RPKI) is a framework created in 2008. Its goal is to provide a source of truth for Internet Resources (IP addresses) and ASes in signed cryptographically signed records called Route Origin Objects (ROA).

Recently, we’ve seen the significant threshold of two hundred thousands of ROAs being passed. This represents a big step in making the Internet more secure against accidental and deliberate BGP tampering.

We have talked about RPKI in the past but we thought it would be a good time for an update.

In a more technical context, the RPKI framework consists of two parts:

  • IP addresses need to be cryptographically signed by their owners in a database managed by a Trust Anchor: Afrinic, APNIC, ARIN, LACNIC and RIPE. Those Continue reading

ClickHouse Capacity Estimation Framework

ClickHouse Capacity Estimation Framework

We use ClickHouse widely at Cloudflare. It helps us with our internal analytics workload, bot management, customer dashboards, and many other systems. For instance, before Bot Management can analyze and classify our traffic, we need to collect logs. The Firewall Analytics tool needs to store and query data somewhere too. The same goes for our new Cloudflare Radar project. We are using ClickHouse for this purpose. It is a big database that can store huge amounts of data and return it on demand. This is not the first time we have talked about ClickHouse, there is a dedicated blogpost on how we introduced ClickHouse for HTTP analytics.

Our biggest cluster has more than 100 nodes, another one about half that number. Besides that, we have over 20 clusters that have at least three nodes and the replication factor of three. Our current insertion rate is about 90M rows per second.

We use the standard approach in ClickHouse schema design. At the top level we have clusters, which hold shards, a group of nodes, and a node is a physical machine. You can find technical characteristics of the nodes here. Stored data is replicated between clusters. Different shards hold different parts Continue reading

Looking Ahead: Five Opportunities on The Horizon According to Tech Leaders

Looking Ahead: Five Opportunities on The Horizon According to Tech Leaders

Dozens of top leaders and thinkers from the tech industry and beyond recently joined us for a series of fireside chats commemorating Cloudflare’s 10th birthday. Over the course of 24 hours of conversation, these leaders shared their thoughts on everything from entrepreneurship to mental health — and how the Internet will continue to play a vital role.

Here are some of the highlights.

On the global opportunity for entrepreneurs

Anu Hariharan
Partner, Y Combinator’s Continuity Fund

Looking Ahead: Five Opportunities on The Horizon According to Tech Leaders

Fast forwarding ten years from now, I think entrepreneurship is global, and you're already seeing signs of that. 27% of YC startups are headquartered outside the US. And I'm willing to bet that in a decade, at least 50% of YC startups will be headquartered outside the US. And so I think the sheer nature of the Internet democratizing information, more companies being global, like Facebook, Google, Uber — talent is everywhere. I think you will see multi-billion dollar companies coming out of other regions.

People have this perception that everything is a zero sum game, or that we are already at peak Internet penetration. Absolutely not. The global market cap is ~$85 trillion. Less than 10% is e-commerce. Internet enabled businesses is $8 Continue reading

The Serverlist: Serverless Wasm AI, Building Automatic Platform Optimizations, and more!

The Serverlist: Serverless Wasm AI, Building Automatic Platform Optimizations, and more!

Check out our twenty-first edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.

Unwrap the SERVFAIL

Unwrap the SERVFAIL

We recently released a new version of Cloudflare Resolver which adds a piece of information called “Extended DNS Errors” (EDE) along with the response code under certain circumstances. This will be helpful in tracing DNS resolution errors and figuring out what went wrong behind the scenes.

Unwrap the SERVFAIL
(image from: https://www.pxfuel.com/en/free-photo-expka)

A tight-lipped agent

The DNS protocol was designed to map domain names to IP addresses. To inform the client about the result of the lookup, the protocol has a 4 bit field, called response code/RCODE. The logic to serve a response might look something like this:

function lookup(domain) {
    ...
    switch result {
    case "No error condition":
        return NOERROR with client expected answer
    case "No record for the request type":
        return NOERROR
    case "The request domain does not exist":
        return NXDOMAIN
    case "Refuse to perform the specified operation for policy reasons":
        return REFUSE
    default("Server failure: unable to process this query due to a problem with the name server"):
        return SERVFAIL
    }
}

try {
    lookup(domain)
} catch {
    return SERVFAIL
}

Although the context hasn't changed much, protocol extensions such as DNSSEC have been added, which makes the RCODE run out of space to express the server's internal Continue reading

1 66 67 68 69 70 129