Archive

Category Archives for "CloudFlare"

Log analytics using ClickHouse

Log analytics using ClickHouse

This is an adapted transcript of a talk we gave at Monitorama 2022. You can find the slides with presenter’s notes here and video here.

Log analytics using ClickHouse

When a request at Cloudflare throws an error, information gets logged in our requests_error pipeline. The error logs are used to help troubleshoot customer-specific or network-wide issues.

We, Site Reliability Engineers (SREs), manage the logging platform. We have been running Elasticsearch clusters for many years and during these years, the log volume has increased drastically. With the log volume increase, we started facing a few issues. Slow query performance and high resource consumption to list a few. We aimed to improve the log consumer's experience by improving query performance and providing cost-effective solutions for storing logs. This blog post discusses challenges with logging pipelines and how we designed the new architecture to make it faster and cost-efficient.

Before we dive into challenges in maintaining the logging pipelines, let us look at the characteristics of logs.

Characteristics of logs

Log analytics using ClickHouse

Unpredictable - In today's world, where there are tons of microservices, the amount of logs a centralized logging system will receive is very unpredictable. There are various reasons why capacity estimation of log volume is so difficult. Continue reading

Cloudflare’s abuse policies & approach

Cloudflare's abuse policies & approach
Cloudflare's abuse policies & approach

Cloudflare launched nearly twelve years ago. We’ve grown to operate a network that spans more than 275 cities in over 100 countries. We have millions of customers: from small businesses and individual developers to approximately 30 percent of the Fortune 500. Today, more than 20 percent of the web relies directly on Cloudflare’s services.

Over the time since we launched, our set of services has become much more complicated. With that complexity we have developed policies around how we handle abuse of different Cloudflare features. Just as a broad platform like Google has different abuse policies for search, Gmail, YouTube, and Blogger, Cloudflare has developed different abuse policies as we have introduced new products.

We published our updated approach to abuse last year at:

https://www.cloudflare.com/trust-hub/abuse-approach/

However, as questions have arisen, we thought it made sense to describe those policies in more detail here.  

The policies we built reflect ideas and recommendations from human rights experts, activists, academics, and regulators. Our guiding principles require abuse policies to be specific to the service being used. This is to ensure that any actions we take both reflect the ability to address the harm and minimize unintended consequences. We believe that Continue reading

Introducing thresholds in Security Event Alerting: a z-score love story

Introducing thresholds in Security Event Alerting: a z-score love story
Introducing thresholds in Security Event Alerting: a z-score love story

Today we are excited to announce thresholds for our Security Event Alerts: a new and improved way of detecting anomalous spikes of security events on your Internet properties. Previously, our calculations were based on z-score methodology alone, which was able to determine most of the significant spikes. By introducing a threshold, we are able to make alerts more accurate and only notify you when it truly matters. One can think of it as a romance between the two strategies. This is the story of how they met.

Author’s note: as an intern at Cloudflare I got to work on this project from start to finish from investigation all the way to the final product.

Once upon a time

In the beginning, there were Security Event Alerts. Security Event Alerts are notifications that are sent whenever we detect a threat to your Internet property. As the name suggests, they track the number of security events, which are requests to your application that match security rules. For example, you can configure a security rule that blocks access from certain countries. Every time a user from that country tries to access your Internet property, it will log as a security event. While a Continue reading

Performance isolation in a multi-tenant database environment

Performance isolation in a multi-tenant database environment
Performance isolation in a multi-tenant database environment

Operating at Cloudflare scale means that across the technology stack we spend a great deal of time handling different load conditions. In this blog post we talk about how we solved performance difficulties with our Postgres clusters. These clusters support a large number of tenants and highly variable load conditions leading to the need to isolate activity to prevent tenants taking too much time from others. Welcome to real-world, large database cluster management!

As an intern at Cloudflare I got to work on improving how our database clusters behave under load and open source the resulting code.

Cloudflare operates production Postgres clusters across multiple regions in data centers. Some of our earliest service offerings, such as our DNS Resolver, Firewall, and DDoS Protection, depend on our Postgres clusters' high availability for OLTP workloads. The high availability cluster manager, Stolon, is employed across all clusters to independently control and replicate data across Postgres instances and elect Postgres leaders and failover under high load scenarios.

PgBouncer and HAProxy act as the gateway layer in each cluster. Each tenant acquires client-side connections from PgBouncer instead of Postgres directly. PgBouncer holds a pool of maximum server-side connections to Postgres, allocating those across multiple Continue reading

Open sourcing our fork of PgBouncer

Open sourcing our fork of PgBouncer
Open sourcing our fork of PgBouncer

Cloudflare operates highly available Postgres production clusters across multiple data centers, supporting the transactional workloads of our core service offerings such as our DNS Resolver, Firewall, and DDoS Protection.

Multiple PgBouncer instances sit at the front of the gateway layer per each cluster, acting as a TCP proxy that provides Postgres connection pooling. PgBouncer’s pooling enables upstream applications to connect to Postgres, without having to constantly open and close connections (expensive) at the database level, while also reducing the number of Postgres connections used. Each tenant acquires client-side connections from PgBouncer instead of Postgres directly.

Open sourcing our fork of PgBouncer

PgBouncer will hold a pool of maximum server-side connections to Postgres, allocating those across multiple tenants to prevent Postgres connection starvation. From here, PgBouncer will forward backend queries to HAProxy, which load balances across Postgres primary and read replicas.

As an intern at Cloudflare I got to work on improving how our database clusters behave under load and open source the resulting code.

We run our Postgres infrastructure in non-containerized, bare metal environments which consequently leads to multitenant resource contention between Postgres users. To enforce stricter tenant performance isolation at the database level (CPU time utilized, memory consumption, disk IO operations), we’d like to configure Continue reading

Deep dives & how the Internet works

Deep dives & how the Internet works
Deep dives & how the Internet works

When August comes, for many, at least in the Northern Hemisphere, it’s time to enjoy summer and/or vacations. Here are some deep dive reading suggestions from our Cloudflare Blog for any time, weather or time of the year. There’s also some reading material on how the Internet works, and a glimpse into our history.

To create the list (that goes beyond 2022), initially we asked inside the company for favorite blog posts. Many explained how a particular blog post made them want to work at Cloudflare (including some of those who have been at the company for many years). And then, we also heard from readers by asking the question on our Twitter account: “What’s your favorite blog post from the Cloudflare Blog and why?”

In early July (thinking of the July 4 US holiday) we did a sum up where some of the more recent blog posts were referenced. We’ve added a few to that list:

  • Eliminating CAPTCHAs on iPhones and Macs (✍️)
    How it works using open standards. On this topic, you can also read the detailed blog post from our research team, from 2021: Humanity wastes about Continue reading

Cloudflare Support Portal gets an overhaul

Cloudflare Support Portal gets an overhaul
Cloudflare Support Portal gets an overhaul

The Cloudflare Support team is excited to announce the launch of our brand-new Customer Support Portal. When our customers open support tickets, we understand that they want quick and accurate responses from us. For those of you who have opened a support ticket in the past, we are certain you will notice the improvements we've made! The new Support Portal lives where our ticket submission form has always been, dash.cloudflare.com/support, but that's where the similarities between the old and the new one end.

What can you expect in the new portal?

The new Support Portal will help you solve your problems quickly and effectively, by getting you on the fastest path to resolution. In some cases, the most efficient way to resolve your issue will be to use our self-help resources or our machine learning-trained Support Bot. Other times, the most efficient way to resolve your issue will be by working with one of our Support Engineers via ticket, phone or chat, depending on your plan type. Regardless of how we help you solve your issue, we will have more context about the products you are using and your issue up front, reducing time-consuming back and forth.

Continue reading

Crawler Hints supports Microsoft’s IndexNow in helping users find new content

Crawler Hints supports Microsoft’s IndexNow in helping users find new content
Crawler Hints supports Microsoft’s IndexNow in helping users find new content

The web is constantly changing. Whether it’s news or updates to your social feed, it’s a constant flow of information. As a user, that’s great. But have you ever stopped to think how search engines deal with all the change?

It turns out, they “index” the web on a regular basis — sending bots out, to constantly crawl webpages, looking for changes. Today, bot traffic accounts for about 30% of total traffic on the Internet, and given how foundational search is to using the Internet, it should come as no surprise that search engine bots make up a large proportion of that what might come as a surprise is how inefficient the model is, though: we estimate that over 50% of crawler traffic is wasted effort.

This has a huge impact. There’s all the additional capacity that owners of websites need to bake into their site to absorb the bots crawling all over it. There’s the transmission of the data. There’s the CPU cost of running the bots. And when you’re running at the scale of the Internet, all of this has a pretty big environmental footprint.

Part of the problem, though, is nobody had really stopped to ask: maybe Continue reading

2022 attacks! An August reading list to go “Shields Up”

2022 attacks! An August reading list to go “Shields Up”
2022 attacks! An August reading list to go “Shields Up”

In 2022, cybersecurity is a must-have for those who don’t want to take chances on getting caught in a cyberattack with difficult to deal consequences. And with a war in Europe (Ukraine) still going on, cyberwar also doesn’t show signs of stopping in a time when there never were so many people online, 4.95 billion in early 2022, 62.5% of the world’s total population (estimates say it grew around 4% during 2021 and 7.3% in 2020).

Throughout the year we, at Cloudflare, have been making new announcements of products, solutions and initiatives that highlight the way we have been preventing, mitigating and constantly learning, over the years, with several thousands of small and big cyberattacks. Right now, we block an average of 124 billion cyber threats per day. The more we deal with attacks, the more we know how to stop them, and the easier it gets to find and deal with new threats — and for customers to forget we’re there, protecting them.

In 2022, we have been onboarding many customers while they’re being attacked, something we know well from the past (Wikimedia/Wikipedia or Eurovision are just two case-studies of many, Continue reading

The mechanics of a sophisticated phishing scam and how we stopped it

The mechanics of a sophisticated phishing scam and how we stopped it
The mechanics of a sophisticated phishing scam and how we stopped it

Yesterday, August 8, 2022, Twilio shared that they’d been compromised by a targeted phishing attack. Around the same time as Twilio was attacked, we saw an attack with very similar characteristics also targeting Cloudflare’s employees. While individual employees did fall for the phishing messages, we were able to thwart the attack through our own use of Cloudflare One products, and physical security keys issued to every employee that are required to access all our applications.

We have confirmed that no Cloudflare systems were compromised. Our Cloudforce One threat intelligence team was able to perform additional analysis to further dissect the mechanism of the attack and gather critical evidence to assist in tracking down the attacker.

This was a sophisticated attack targeting employees and systems in such a way that we believe most organizations would be likely to be breached. Given that the attacker is targeting multiple organizations, we wanted to share here a rundown of exactly what we saw in order to help other companies recognize and mitigate this attack.

Targeted Text Messages

On July 20, 2022, the Cloudflare Security team received reports of employees receiving legitimate-looking text messages pointing to what appeared to be a Cloudflare Okta login Continue reading

Introducing new Cloudflare for SaaS documentation

Introducing new Cloudflare for SaaS documentation
Introducing new Cloudflare for SaaS documentation

As a SaaS provider, you’re juggling many challenges while building your application, whether it’s custom domain support, protection from attacks, or maintaining an origin server. In 2021, we were proud to announce Cloudflare for SaaS for Everyone, which allows anyone to use Cloudflare to cover those challenges, so they can focus on other aspects of their business. This product has a variety of potential implementations; now, we are excited to announce a new section in our Developer Docs specifically devoted to Cloudflare for SaaS documentation to allow you take full advantage of its product suite.

Cloudflare for SaaS solution

You may remember, from our October 2021 blog post, all the ways that Cloudflare provides solutions for SaaS providers:

  • Set up an origin server
  • Encrypt your customers’ traffic
  • Keep your customers online
  • Boost the performance of global customers
  • Support custom domains
  • Protect against attacks and bots
  • Scale for growth
  • Provide insights and analytics
Introducing new Cloudflare for SaaS documentation

However, we received feedback from customers indicating confusion around actually using the capabilities of Cloudflare for SaaS because there are so many features! With the existing documentation, it wasn’t 100% clear how to enhance security and performance, or how to support custom domains. Now, we want Continue reading

1.1.1.1 + WARP: More features, still private

1.1.1.1 + WARP: More features, still private
1.1.1.1 + WARP: More features, still private

It’s a Saturday night. You open your browser, looking for nearby pizza spots that are open. If the search goes as intended, your browser will show you results that are within a few miles, often based on the assumed location of your IP address. At Cloudflare, we affectionately call this type of geolocation accuracy the “pizza test”. When you use a Cloudflare product that sits between you and the Internet (for example, WARP), it’s one of the ways we work to balance user experience and privacy. Too inaccurate and you’re getting pizza places from a neighboring country; too accurate and you’re reducing the privacy benefits of obscuring your location.

With that in mind, we’re excited to announce two major improvements to our 1.1.1.1 + WARP apps: first, an improvement to how we ensure search results and other geographically-aware Internet activity work without compromising your privacy, and second, a larger network with more locations available to WARP+ subscribers, powering even speedier connections to our global network.

A better Internet browsing experience for every WARP user

When we originally built the 1.1.1.1+ WARP mobile app, we wanted to create a consumer-friendly way to connect to Continue reading

Build your next big idea with Cloudflare Pages

Build your next big idea with Cloudflare Pages
Build your next big idea with Cloudflare Pages

Have you ever had a surge of inspiration for a project? That feeling when you have a great idea – a big idea — that you just can’t shake? When all you can think about is putting your hands to your keyboard and hacking away? Building a website takes courage, creativity, passion and drive, and with Cloudflare Pages we believe nothing should stand in the way of that vision.

Especially not a price tag.

Big ideas

We built Pages to be at the center of your developer experience – a way for you to get started right away without worrying about the heavy lift of setting up a fullstack app.  A quick commit to your git provider or direct upload to our platform, and your rich and powerful site is deployed to our network of 270+ data centers in seconds. And above all, we built Pages to scale with you as you grow exponentially without getting hit by an unexpected bill.

The limit does not exist

We’re a platform that’s invested in your vision – no matter how wacky and wild (the best ones usually are!). That’s why for many parts of Pages we want your experience to be Continue reading

Building scheduling system with Workers and Durable Objects

Building scheduling system with Workers and Durable Objects
Building scheduling system with Workers and Durable Objects

We rely on technology to help us on a daily basis – if you are not good at keeping track of time, your calendar can remind you when it's time to prepare for your next meeting. If you made a reservation at a really nice restaurant, you don't want to miss it! You appreciate the app to remind you a day before your plans the next evening.

However, who tells the application when it's the right time to send you a notification? For this, we generally rely on scheduled events. And when you are relying on them, you really want to make sure that they occur. Turns out, this can get difficult. The scheduler and storage backend need to be designed with scale in mind - otherwise you may hit limitations quickly.

Workers, Durable Objects, and Alarms are actually a perfect match for this type of workload. Thanks to the distributed architecture of Durable Objects and their storage, they are a reliable and scalable option. Each Durable Object has access to its own isolated storage and alarm scheduler, both being automatically replicated and failover in case of failures.

There are many use cases where having a reliable scheduler can come Continue reading

Experiment with post-quantum cryptography today

Experiment with post-quantum cryptography today
Experiment with post-quantum cryptography today

Practically all data sent over the Internet today is at risk in the future if a sufficiently large and stable quantum computer is created. Anyone who captures data now could decrypt it.

Luckily, there is a solution: we can switch to so-called post-quantum (PQ) cryptography, which is designed to be secure against attacks of quantum computers. After a six-year worldwide selection process, in July 2022, NIST announced they will standardize Kyber, a post-quantum key agreement scheme. The standard will be ready in 2024, but we want to help drive the adoption of post-quantum cryptography.

Today we have added support for the X25519Kyber512Draft00 and X25519Kyber768Draft00 hybrid post-quantum key agreements to a number of test domains, including pq.cloudflareresearch.com.

Do you want to experiment with post-quantum on your test website for free? Mail [email protected] to enroll your test website, but read the fine-print below.

What does it mean to enable post-quantum on your website?

If you enroll your website to the post-quantum beta, we will add support for these two extra key agreements alongside the existing classical encryption schemes such as X25519. If your browser doesn’t support these post-quantum key agreements (and none at the time Continue reading

Building and using Managed Components with WebCM

Building and using Managed Components with WebCM
Building and using Managed Components with WebCM

Managed Components are here to shake up the way third-party tools integrate with websites. Two months ago we announced that we’re open sourcing parts of the most innovative technologies behind Cloudflare Zaraz, making them accessible and usable to everyone online. Since then, we’ve been working hard to make sure that the code is well documented and all pieces are fun and easy to use. In this article, I want to show you how Managed Components can be useful for you right now, if you manage a website or if you’re building third-party tools. But before we dive in, let’s talk about the past.

Third-party scripts are a threat to your website

For decades, if you wanted to add an analytics tool to your site, a chat widget, a conversion pixel or any other kind of tool – you needed to include an external script. That usually meant adding some code like this to your website:

<script src=”https://example.com/script.js”></script>

If you think about it – it’s a pretty terrible idea. Not only that you’re now asking the browser to connect to another server, fetch and execute more JavaScript code – you’re also completely giving up the control on your Continue reading

Load Balancing with Weighted Pools

Load Balancing with Weighted Pools
Load Balancing with Weighted Pools

Anyone can take advantage of Cloudflare’s far-reaching network to protect and accelerate their online presence. Our vast number of data centers, and their proximity to Internet users around the world, enables us to secure and accelerate our customers’ Internet applications, APIs and websites. Even a simple service with a single origin server can leverage the massive scale of the Cloudflare network in 270+ cities. Using the Cloudflare cache, you can support more requests and users without purchasing new servers.

Whether it is to guarantee high availability through redundancy, or to support more dynamic content, an increasing number of services require multiple origin servers. The Cloudflare Load Balancer keeps our customer’s services highly available and makes it simple to spread out requests across multiple origin servers. Today, we’re excited to announce a frequently requested feature for our Load Balancer – Weighted Pools!

What’s a Weighted Pool?

Before we can answer that, let’s take a quick look at how our load balancer works and define a few terms:

Origin Servers - Servers which sit behind Cloudflare and are often located in a customer-owned datacenter or at a public cloud provider.

Origin Pool - A logical collection of origin servers. Most pools Continue reading

Running Zig with WASI on Cloudflare Workers

Running Zig with WASI on Cloudflare Workers
Running Zig with WASI on Cloudflare Workers

After the recent announcement regarding WASI support in Workers, I decided to see what it would take to get code written in Zig to run as a Worker, and it turned out to be trivial. This post documents the process I followed as a new user of Zig. It’s so exciting to see how Cloudflare Workers is a polyglot platform allowing you to write programs in the language you love, or the language you’re learning!

Hello, World!

I’m not a Zig expert by any means, and to keep things entirely honest I’ve only just started looking into the language, but we all have to start somewhere. So, if my Zig code isn’t perfect please bear with me. My goal was to build a real, small program using Zig and deploy it on Cloudflare Workers. And to see how fast I can go from a blank screen to production code.

My goal for this wasn’t ambitious, just read some text from stdin and print it to stdout with line numbers, like running cat -n. But it does show just how easy the Workers paradigm is. This Zig program works identically on the command-line on my laptop and as an HTTP Continue reading

When the window is not fully open, your TCP stack is doing more than you think

When the window is not fully open, your TCP stack is doing more than you think

Over the years I've been lurking around the Linux kernel and have investigated the TCP code many times. But when recently we were working on Optimizing TCP for high WAN throughput while preserving low latency, I realized I have gaps in my knowledge about how Linux manages TCP receive buffers and windows. As I dug deeper I found the subject complex and certainly non-obvious.

In this blog post I'll share my journey deep into the Linux networking stack, trying to understand the memory and window management of the receiving side of a TCP connection. Specifically, looking for answers to seemingly trivial questions:

  • How much data can be stored in the TCP receive buffer? (it's not what you think)
  • How fast can it be filled? (it's not what you think either!)

Our exploration focuses on the receiving side of the TCP connection. We'll try to understand how to tune it for the best speed, without wasting precious memory.

A case of a rapid upload

To best illustrate the receive side buffer management we need pretty charts! But to grasp all the numbers, we need a bit of theory.

We'll draw charts from a receive side of a TCP flow, Continue reading

1 32 33 34 35 36 129