Ultra Ethernet: Domain Creation Process in Libfabric

Creating a domain object is the step where the application establishes a logical context for a NIC within a fabric, enabling endpoints, completion queues, and memory regions to be created and managed consistently.

Phase 1: Application (Discovery & choice — selecting a domain snapshot)

During discovery, the provider had populated one or more fi_info entries — each entry was a snapshot describing one possible NIC/port/transport combination. Each fi_info contained nested attribute structures for fabric, domain, and endpoint: fi_fabric_attr, fi_domain_attr, and fi_ep_attr. The fi_domain_attr substructure captured the domain-level template the provider had reported during discovery (memory registration modes, MR key sizes, counts and limits, capability and mode bitmasks, CQ/CTX limits, authentication key sizes, etc.).

When the application had decided which NIC/port it wanted to use, it selected a single fi_info entry whose fi_domain_attr matched its needs. That chosen fi_info became the authoritative configuration for domain creation, containing both the application’s requested settings and the provider-reported capabilities. At this phase, the application moved forward from fabric initialization to domain creation.

To create the domain, the application called the fi_domain function:


API Call → Create Domain object

    Within Fabric ID: 0xF1DFA01

    Using fi_info structure: 0xCAFE43E

    On Continue reading

Microsoft And Corintis Champion Microfluidics Cooling Pioneered By IBM

It is a now well-known fact in the datacenters of the world, which are trying to cram ten pounds of power usage into a five pound bit barn bag, that liquid cooling is an absolute necessity for the density of high performance computing systems to be increased to drive down latency between components and therefore drive up performance.

Microsoft And Corintis Champion Microfluidics Cooling Pioneered By IBM was written by Timothy Prickett Morgan at The Next Platform.

TNO043: Under the Manhole Cover: The Architecture of an Internet Exchange

In an IT world full of abstraction, overlays, and virtualization, it’s important to remember the physical infrastructure that supports all those things. So let’s get inside Mass IX, the Massachusetts Internet Exchange, to get a holistic view of the logical architecture and protocol mechanics of peering and Internet exchanges, as well as the iron, steel,... Read more »

Cloudflare just got faster and more secure, powered by Rust

Cloudflare is relentless about building and running the world’s fastest network. We have been tracking and reporting on our network performance since 2021: you can see the latest update here.

Building the fastest network requires work in many areas. We invest a lot of time in our hardware, to have efficient and fast machines. We invest in peering arrangements, to make sure we can talk to every part of the Internet with minimal delay. On top of this, we also have to invest in the software we run our network on, especially as each new product can otherwise add more processing delay.

No matter how fast messages arrive, we introduce a bottleneck if that software takes too long to think about how to process and respond to requests. Today we are excited to share a significant upgrade to our software that cuts the median time we take to respond by 10ms and delivers a 25% performance boost, as measured by third-party CDN performance tests.

We've spent the last year rebuilding major components of our system, and we've just slashed the latency of traffic passing through our network for millions of our customers. At the same time, we've made our Continue reading

Monitoring AS-SETs and why they matter

Introduction to AS-SETs

An AS-SET, not to be confused with the recently deprecated BGP AS_SET, is an Internet Routing Registry (IRR) object that allows network operators to group related networks together. AS-SETs have been used historically for multiple purposes such as grouping together a list of downstream customers of a particular network provider. For example, Cloudflare uses the AS13335:AS-CLOUDFLARE AS-SET to group together our list of our own Autonomous System Numbers (ASNs) and our downstream Bring-Your-Own-IP (BYOIP) customer networks, so we can ultimately communicate to other networks whose prefixes they should accept from us. 

In other words, an AS-SET is currently the way on the Internet that allows someone to attest the networks for which they are the provider. This system of provider authorization is completely trust-based, meaning it's not reliable at all, and is best-effort. The future of an RPKI-based provider authorization system is coming in the form of ASPA (Autonomous System Provider Authorization), but it will take time for standardization and adoption. Until then, we are left with AS-SETs.

Because AS-SETs are so critical for BGP routing on the Internet, network operators need to be able to monitor valid and invalid AS-SET memberships for Continue reading

Introducing Observatory and Smart Shield — see how the world sees your website, and make it faster in one click

Modern users expect instant, reliable web experiences. When your application is slow, they don’t just complain — they leave. Even delays as small as 100 ms have been shown to have a measurable impact on revenue, conversions, bounce rate, engagement and more

If you’re responsible for delivering on these expectations to the users of your product, you know there are many monitoring tools that show you how visitors experience your website, and can let you know when things are slow or causing issues. This is essential, but we believe understanding the condition is only half the story. The real value comes from integrating monitoring and remedies in the same view, giving customers the ability to quickly identify and resolve issues.

That's why today, we're excited to launch the new and improved Observatory, now in open beta. This monitoring and observability tool goes beyond charts and graphs, by also telling you exactly how to improve your application's performance and resilience, and immediately showing you the impact of those changes. And we’re releasing it to all subscription tiers (including Free!), available today.

But wait, there’s more! To make your users’ experience in Cloudflare even faster, we’re launching Smart Shield, Continue reading

An AI Index for all our customers

Today, we’re announcing the private beta of AI Index for domains on Cloudflare, a new type of web index that gives content creators the tools to make their data discoverable by AI, and gives AI builders access to better data for fair compensation.

With AI Index enabled on your domain, we will automatically create an AI-optimized search index for your website, and expose a set of ready-to-use standard APIs and tools including an MCP server, LLMs.txt, and a search API. Our customers will own and control that index and how it’s used, and you will have the ability to monetize access through Pay per crawl and the new x402 integrations. You will be able to use it to build modern search experiences on your own site, and more importantly, interact with external AI and Agentic providers to make your content more discoverable while being fairly compensated.

For AI builders—whether developers creating agentic applications, or AI platform companies providing foundational LLM models—Cloudflare will offer a new way to discover and retrieve web content: direct pub/sub connections to individual websites with AI Index. Instead of indiscriminate crawling, builders will be able to subscribe to specific sites that have opted in for Continue reading

Introducing new regional Internet traffic and Certificate Transparency insights on Cloudflare Radar

Since launching during Birthday Week in 2020, Radar has announced significant new capabilities and data sets during subsequent Birthday Weeks. We continue that tradition this year with a two-part launch, adding more dimensions to Radar’s ability to slice and dice the Internet.

First, we’re adding regional traffic insights. Regional traffic insights bring a more localized perspective to the traffic trends shown on Radar.

Second, we’re adding detailed Certificate Transparency (CT) data, too. The new CT data builds on the work that Cloudflare has been doing around CT since 2018, including Merkle Town, our initial CT dashboard.

Both features extend Radar's mission of providing deeper, more granular visibility into the health and security of the Internet. Below, we dig into these new capabilities and data sets.

Introducing regional Internet traffic insights on Radar

Cloudflare Radar initially launched with visibility into Internet traffic trends at a national level: want to see how that Internet shutdown impacted traffic in Iraq, or what IPv6 adoption looks like in India? It’s visible on Radar. Just a year and a half later, in March 2022, we launched Autonomous System (ASN) pages on Radar. This has enabled us to bring more granular visibility Continue reading

Code Mode: the better way to use MCP

It turns out we've all been using MCP wrong.

Most agents today use MCP by directly exposing the "tools" to the LLM.

We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API.

The results are striking:

  1. We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.

  2. The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM's neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.

In short, LLMs are better at writing code to call MCP, than at calling MCP directly.

What's MCP?

For those that aren't familiar: Model Context Protocol is a standard protocol Continue reading

Eliminating Cold Starts 2: shard and conquer

Five years ago, we announced that we were Eliminating Cold Starts with Cloudflare Workers. In that episode, we introduced a technique to pre-warm Workers during the TLS handshake of their first request. That technique takes advantage of the fact that the TLS Server Name Indication (SNI) is sent in the very first message of the TLS handshake. Armed with that SNI, we often have enough information to pre-warm the request’s target Worker.

Eliminating cold starts by pre-warming Workers during TLS handshakes was a huge step forward for us, but “eliminate” is a strong word. Back then, Workers were still relatively small, and had cold starts constrained by limits explained later in this post. We’ve relaxed those limits, and users routinely deploy complex applications on Workers, often replacing origin servers. Simultaneously, TLS handshakes haven’t gotten any slower. In fact, TLS 1.3 only requires a single round trip for a handshake – compared to three round trips for TLS 1.2 – and is more widely used than it was in 2021.

Earlier this month, we finished deploying a new technique intended to keep pushing the boundary on cold start reduction. The new technique (or old, depending on your perspective) uses Continue reading

Network performance update: Birthday Week 2025

We are committed to being the fastest network in the world because improvements in our performance translate to improvements for the own end users of your application. We are excited to share that Cloudflare continues to be the fastest network for the most peered networks in the world.

We relentlessly measure our own performance and our performance against peers. We publish those results routinely, starting with our first update in June 2021 and most recently with our last post in September 2024.

Today’s update breaks down where we have improved since our update last year and what our priorities are going into the next year. While we are excited to be the fastest in the greatest number of last-mile ISPs, we are never done improving and have more work to do.

How do we measure this metric, and what are the results?

We measure network performance by attempting to capture what the experience is like for Internet users across the globe. To do that we need to simulate what their connection is like from their last-mile ISP to our networks.

We start by taking the 1,000 largest networks in the world based on estimated population. We use that to give Continue reading

Ultra Ethernet: Fabric Creation Process in Libfabric

[edit Oct-8, 2025]
New version of this subject can be found here: Ultra Ethernet: Fabric Object - What it is and How it is created

Phase 1: Application (Discovery & choice)

After the UET provider populated fi_info structures for each NIC/port combination during discovery, the application can begin the object creation process. It first consults the in-memory fi_info list to identify the entry that best matches its requirements. Each fi_info contains nested attribute structures describing fabric, domain, and endpoint capabilities, including fi_fabric_attr (fabric name, provider identifier, version information), fi_domain_attr (memory registration mode, key details, domain capabilities), and fi_ep_attr (endpoint type, reliable versus unreliable semantics, size limits, and supported capabilities). The application examines the returned entries and selects the fi_info that satisfies its needs (for example: provider == "uet", fabric name == "UET", required capabilities, reliable transport, or a specific memory registration mode). The chosen fi_info then provides the attributes — effectively serving as hints — that the application passes into subsequent creation calls such as fi_fabric(), fi_domain(), and fi_endpoint(). Each fi_info acts as a self-contained “capability snapshot,” describing one possible combination of NIC, port, and transport mode.


Phase 2: Libfabric Core (dispatch & wiring)

When the application calls fi_fabric(), the Continue reading

How Cloudflare uses the world’s greatest collection of performance data to make the world’s fastest global network even faster

Cloudflare operates the fastest network on the planet. We’ve shared an update today about how we are overhauling the software technology that accelerates every server in our fleet, improving speed globally.

That is not where the work stops, though. To improve speed even further, we have to also make sure that our network swiftly handles the Internet-scale congestion that hits it every day, routing traffic to our now-faster servers.

We have invested in congestion control for years. Today, we are excited to share how we are applying a superpower of our network, our massive Free Plan user base, to optimize performance and find the best way to route traffic across our network for all our customers globally.

Early results have seen performance increases that average 10% faster than the prior baseline. We achieved this by applying different algorithmic methods to improve performance based on the data we observe about the Internet each day. We are excited to begin rolling out these improvements to all customers.

How does traffic arrive in our network?

The Internet is a massive collection of interconnected networks, each composed of many machines (“nodes”). Data is transmitted by breaking it up into small packets, and passing them Continue reading

Lab: Protect IS-IS Routing Data with MD5 Authentication

Like OSPF and BGP, IS-IS contains a simple mechanism to authenticate routing traffic – IS-IS packets can include a cleartext password or an MD5- or SHA hash. Unlike OSPF, IS-IS can also authenticate:

  • The hello packets exchanged between routers
  • The contents of Link State PDUs flooded across an area or a domain.

Want to know more? Check out the Protect IS-IS Routing Data with MD5 Authentication lab exercise.

Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to feature/3-md5 and execute netlab up.

Dicing, Slicing, And Augmenting Gartner’s AI Spending Forecast

When we try to predict the weather, we use ensembles of the initial conditions on the ground, in the oceans, and throughout the air to create a kind of probabilistic average forecast and then we take ensembles of models, which often have very different answers for extreme weather conditions like hurricanes and typhoons, to get a better sense of what might happen wherever and whenever we are concerned.

Dicing, Slicing, And Augmenting Gartner’s AI Spending Forecast was written by Timothy Prickett Morgan at The Next Platform.

IPB184: IPv6 Basics: Dual-Stack

We’re diving into another IPv6 Basics today with the topic of dual-stack, which means running the IPv4 and IPv6 protocol stacks simultaneously. We get many questions about the implications of running dual-stack, and in this episode we’ll provide answers. We start by getting a little finicky about the definition of dual-stack, and then talk about... Read more »

Cloudflare’s developer platform keeps getting better, faster, and more powerful. Here’s everything that’s new.

When you build on Cloudflare, we consider it our job to do the heavy lifting for you. That’s been true since we introduced Cloudflare Workers in 2017, when we first provided a runtime for you where you could just focus on building. 

That commitment is still true today, and many of today’s announcements are focused on just that — removing friction where possible to free you up to build something great. 

There are only so many blog posts we can write (and that you can read)! We have been busy on a much longer list of new improvements, and many of them we’ve been rolling out consistently over the course of the year. Today’s announcement breaks down all the new capabilities in detail, in one single post. The features being released today include:

Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers

We’re not burying the lede on this one: you can now connect Cloudflare Workers to your PlanetScale databases directly and ship full-stack applications backed by Postgres or MySQL. 

We’ve teamed up with PlanetScale because we wanted to partner with a database provider that we could confidently recommend to our users: one that shares our obsession with performance, reliability and developer experience. These are all critical factors for any development team building a serious application. 

Now, when connecting to PlanetScale databases, your connections are automatically configured for optimal performance with Hyperdrive, ensuring that you have the fastest access from your Workers to your databases, regardless of where your Workers are running.

Building full-stack

As Workers has matured into a full-stack platform, we’ve introduced more options to facilitate your connectivity to data. With Workers KV, we made it easy to store configuration and cache unstructured data on the edge. With D1 and Durable Objects, we made it possible to build multi-tenant apps with simple, isolated SQL databases. And with Hyperdrive, we made connecting to external databases fast and scalable from Workers.

Today, we’re introducing a new choice for building on Cloudflare: Postgres and MySQL PlanetScale databases, directly Continue reading