Major data center power failure (again): Cloudflare Code Orange tested

This post is also available in Français, Español.

Here's a post we never thought we'd need to write: less than five months after one of our major data centers lost power, it happened again to the exact same data center. That sucks and, if you're thinking "why do they keep using this facility??," I don't blame you. We're thinking the same thing. But, here's the thing, while a lot may not have changed at the data center, a lot changed over those five months at Cloudflare. So, while five months ago a major data center going offline was really painful, this time it was much less so.

This is a little bit about how a high availability data center lost power for the second time in five months. But, more so, it's the story of how our team worked to ensure that even if one of our critical data centers lost power it wouldn't impact our customers.

On November 2, 2023, one of our critical facilities in the Portland, Oregon region lost power for an extended period of time. It happened because of a cascading series of faults that appears to have been caused by maintenance by the Continue reading

Developer Week 2024 wrap-up

This post is also available in 简体中文, 繁體中文, 日本語, 한국어, Deutsch, Français and Español.

Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.

Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:

We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.
- FAN Communications


Milkshake helps creators, influencers, and business owners create engaging web pages Continue reading

Running CVP on Proxmox

With VMware jacking up the prices and killing off the free version of ESXi, people are looking to alternatives for a virtualization platform. One of the more popular alternatives is Proxmox, which so far I’m really liking.

If you’re looking to run CVP on Proxmox, here is how I get it installed. I’m not sure if Proxmox counts as officially supported for production CVP (it is KVM, however), but it does work fine in lab. Contact Arista TAC if you’re wondering about Proxmox suitability.

Getting the Image to Proxmox

Oddly enough, what you’ll want to do is get a copy of the CVP OVA, not the KVM image. I’m using the most recent release (at the time of writing, always check Arista.com) of cvp-2024.1.0.ova.

Get it onto your Proxmox box (or one of them if you’re doing a cluster). Place it somewhere where there’s enough space to unpack it. In my case, I have a volume called volume2, which is located at /mnt/pve/volume2.

I made a directory called tmp and copied the file to that directory via SCP (using FileZilla, though there’s several ways to get files onto Proxmox, it’s just a Linux box). I Continue reading

Troubleshooting vPC in My Virtual Lab

I’m preparing a blog post on setting up vPC in a VXLAN/EVPN environment. While doing so, I ran into some issues. Rather than simply fixing them, I wanted to share the troubleshooting experience as it can be useful to see all the things I did to troubleshoot, including commands, packet captures, etc., and learn a little about virtual networking. As always, thanks to Peter Palúch for providing assistance with the process.

Topology

The following topology implemented in ESX is used:

Background

I had just configured the vPC peer link and vPC peer link keepalive. I verified that the vPC was functional with the following command:

Leaf1# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1   
Peer status                       : peer adjacency formed ok      
vPC keep-alive status             : peer is alive                 
Configuration consistency status  : success 
Per-vlan consistency status       : success                       
Type-2 consistency status         : success 
vPC role                          : primary                       
Number of vPCs configured         : 1   
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout =  Continue reading

VPP with loopback-only OSPFv3 – Part 1

Bird

Introduction

A few weeks ago I took a good look at the [Babel] protocol. I found a set of features there that I really appreciated. The first was a latency aware routing protocol - this is useful for mesh (wireless) networks but it is also a good fit for IPng’s usecase, notably because it makes use of carrier ethernet which, if any link in the underlying MPLS network fails, will automatically re-route but sometimes with much higher latency. In these cases, Babel can reconverge on its own to a topology that has the lowest end to end latency.

But a second really cool find, is that Babel can use IPv6 nexthops for IPv4 destinations - which is super useful because it will allow me to retire all of the IPv4 /31 point to point networks between my routers. AS8298 has about half of a /24 tied up in these otherwise pointless (pun intended) transit networks.

In the same week, my buddy Benoit asked a question about OSPFv3 on the Bird users mailinglist [ref] which may or may not have been because I had been messing around with Babel using only IPv4 loopback interfaces. And just a Continue reading

Balancing MMLU With The Shortest Path Cost Constraints

og:image

It is almost impossible for someone coming from a purely mathematical background with little exposure to applications to understand how to go about formulating a real-world problem in mathematical terms. - Dantzig.

In our previous post, we explored the Min. Max Link Utilization problem using Linear Programming. The focus was on optimizing network traffic distribution to achieve optimal link utilization. However, this approach did not consider the potential impact on path lengths, allowing flows to be placed on longer paths without constraints. We mentioned this briefly in a passing statement. However, I thought it would be worth exploring the extended problem formulation that introduces an additional constraint to strike a balance between minimizing link utilization and controlling the deviation of paths from their shortest counterparts.

The core objective remains the same—to minimize the maximum link utilization across the network. However, we now introduce a conflicting constraint that aims to limit the extent to which paths can deviate from their shortest routes. This constraint is crucial for keeping path lengths within an acceptable range, as excessively long paths can degrade the user experience.

Mathematically, we are faced with two opposing constraints. On the one hand, we strive to minimize link utilization, Continue reading

Cloudflare acquires Baselime to expand serverless application observability capabilities

Today, we’re thrilled to announce that Cloudflare has acquired Baselime.

The cloud is changing. Just a few years ago, serverless functions were revolutionary. Today, entire applications are built on serverless architectures, from compute to databases, storage, queues, etc. — with Cloudflare leading the way in making it easier than ever for developers to build, without having to think about their architecture. And while the adoption of serverless has made it simple for developers to run fast, it has also made one of the most difficult problems in software even harder: how the heck do you unravel the behavior of distributed systems?

When I started Baselime 2 years ago, our goal was simple: enable every developer to build, ship, and learn from their serverless applications such that they can resolve issues before they become problems.

Since then, we built an observability platform that enables developers to understand the behaviour of their cloud applications. It’s designed for high cardinality and dimensionality data, from logs to distributed tracing with OpenTelemetry. With this data, we automatically surface insights from your applications, and enable you to quickly detect, troubleshoot, and resolve issues in production.

In parallel, Cloudflare has been busy the past few years building Continue reading

Browser Rendering API GA, rolling out Cloudflare Snippets, SWR, and bringing Workers for Platforms to all users

This post is also available in 简体中文, 繁體中文, 日本語, 한국어, Deutsch, Français and Español.

Browser Rendering API is now available to all paid Workers customers with improved session management

In May 2023, we announced the open beta program for the Browser Rendering API. Browser Rendering allows developers to programmatically control and interact with a headless browser instance and create automation flows for their applications and products.

At the same time, we launched a version of the Puppeteer library that works with Browser Rendering. With that, developers can use a familiar API on top of Cloudflare Workers to create all sorts of workflows, such as taking screenshots of pages or automatic software testing.

Today, we take Browser Rendering one step further, taking it out of beta and making it available to all paid Workers' plans. Furthermore, we are enhancing our API and introducing a new feature that we've been discussing for a long time in the open beta community: session management.

Session Management

Session management allows developers to reuse previously opened browsers across Worker's scripts. Reusing browser sessions has the advantage that you don't need to instantiate a new browser for every request and every task, Continue reading

Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications

We're thrilled to announce that PartyKit, an open source platform for deploying real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.

Defining the future of serverless compute around state

Building real-time applications on the web have always been difficult. Not only is it a distributed systems problem, but you need to provision and manage infrastructure, databases, and other services to maintain state across multiple clients. This complexity has traditionally been a barrier to entry for many developers, especially those who are just starting out.

We announced Durable Objects in 2020 as a way of building synchronized real time experiences for the web. Unlike regular serverless functions that are ephemeral and stateless, Durable Objects are stateful, allowing developers to build applications that maintain state across requests. They also act as an ideal synchronization point for building real-time applications that need to maintain state across multiple clients. Combined with WebSockets, Durable Objects can be used to build a wide range of applications, from multiplayer games to collaborative drawing tools.

In 2022, PartyKit began as a project to Continue reading

Blazing fast development with full-stack frameworks and Cloudflare

Hello web developers! Last year we released a slew of improvements that made deploying web applications on Cloudflare much easier, and in response we’ve seen a large growth of Astro, Next.js, Nuxt, Qwik, Remix, SolidStart, SvelteKit, and other web apps hosted on Cloudflare. Today we are announcing major improvements to our integration with these web frameworks that makes it easier to develop sophisticated applications that use our D1 SQL database, R2 object store, AI models, and other powerful features of Cloudflare’s developer platform.

In the past, if you wanted to develop a web framework-powered application with D1 and run it locally, you’d have to build a production build of your application, and then run it locally using `wrangler pages dev`. While this worked, each of your code iterations would take seconds, or tens of seconds for big applications. Iterating using production builds is simply too slow, pulls you out of the flow, and doesn’t allow you to take advantage of all the DX optimizations that framework authors have put a lot of hard work into. This is changing today!

Our goal is to integrate with web frameworks in the most natural way possible, without developers having to learn Continue reading

We’ve added JavaScript-native RPC to Cloudflare Workers

Cloudflare Workers now features a built-in RPC (Remote Procedure Call) system enabling seamless Worker-to-Worker and Worker-to-Durable Object communication, with almost no boilerplate. You just define a class:

export class MyService extends WorkerEntrypoint {
  sum(a, b) {
    return a + b;
  }
}

And then you call it:

let three = await env.MY_SERVICE.sum(1, 2);

No schemas. No routers. Just define methods of a class. Then call them. That's it.

But that's not it

This isn't just any old RPC. We've designed an RPC system so expressive that calling a remote service can feel like using a library – without any need to actually import a library! This is important not just for ease of use, but also security: fewer dependencies means fewer critical security updates and less exposure to supply-chain attacks.

To this end, here are some of the features of Workers RPC:

  • For starters, you can pass Structured Clonable types as the params or return value of an RPC. (That means that, unlike JSON, Dates just work, and you can even have cycles.)
  • You can additionally pass functions in the params or return value of other functions. When the other side calls the function you passed to Continue reading

Community Update: empowering startups building on Cloudflare and creating an inclusive community

With millions of developers around the world building on Cloudflare, we are constantly inspired by what you all are building with us. Every Developer Week, we’re excited to get your hands on new products and features that can help you be more productive, and creative, with what you’re building. But we know our job doesn’t end when we put new products and features in your hands during Developer Week. We also need to cultivate welcoming community spaces where you can come get help, share what you’re building, and meet other developers building with Cloudflare.

We’re excited to close out Developer Week by sharing updates on our Workers Launchpad program, our latest Developer Challenge, and the work we’re doing to ensure our community spaces – like our Discord and Community forums – are safe and inclusive for all developers.

Helping startups go further with Workers Launchpad

In late 2022, we initiated the $2 billion Workers Launchpad Funding Program aimed at aiding the more than one million developers who use Cloudflare's Developer Platform. This initiative particularly benefits startups that are investing in building on Cloudflare to propel their business growth.

The Workers Launchpad Program offers a variety of resources to help builders Continue reading