In September 2024, I described how you can build One-Arm Hub-and-Spoke VPN with MPLS/VPN. In that blog post, I mentioned that the solution doesn’t work on Arista EOS because it allocates MPLS labels to whole VRFs (per-VRF label allocation).
In early September, I received an email from Daniel Blažek telling me that Arista fixed this particular annoyance in the EOS release 4.34.2F. It still uses per-VRF label allocation, but now, you can assign a different label to the default route. Let’s see how that works with our one-arm hub-and-spoke topology:
[Figure updated 13 November 2025]
My previous UET posts explained how an application uses libfabric function API calls to discover available hardware resources and how this information is used to create a hardware abstraction layer composed of Fabric, Domain, and Endpoint objects, along with their child objects — Event Queues, Completion Queues, Completion Counters, Address Vectors, and Memory Regions.
This chapter explains how these objects are used during data transfer operations. It also describes how information is encoded into UET protocol headers, including the Semantic Sublayer (SES) and Packet Delivery Sublayer (PDC). In addition, the chapter covers how the Congestion Management Sublayer (CMS) monitors and controls send queue rates to prevent egress buffer overflows.
Note: In this book, libfabric API calls are divided into two categories for clarity. Functions are used to create and configure fabric objects such as fabrics, domains, endpoints, and memory regions (for example, fi_fabric(), fi_domain(), and fi_mr_reg()). Operations, on the other hand, perform actual data transfer or synchronization between processes (for example, fi_write(), fi_read(), and fi_send()).
Figure 5-1 provides a high-level overview of a libfabric Remote Memory Access (RMA) operation using the fi_write function call. When an application needs to transfer data, such as gradients, from Continue reading
Developers can already use Cloudflare Workflows to build long-running, multi-step applications on Workers. Now, Python Workflows are here, meaning you can use your language of choice to orchestrate multi-step applications.
With Workflows, you can automate a sequence of idempotent steps in your application with built-in error handling and retry behavior. But Workflows were originally supported only in TypeScript. Since Python is the de facto language of choice for data pipelines, artificial intelligence/machine learning, and task automation – all of which heavily rely on orchestration – this created friction for many developers.
Over the years, we’ve been giving developers the tools to build these applications in Python, on Cloudflare. In 2020, we brought Python to Workers via Transcrypt before directly integrating Python into workerd in 2024. Earlier this year, we built support for CPython along with any packages built in Pyodide, like matplotlib and pandas, in Workers. Now, Python Workflows are supported as well, so developers can create robust applications using the language they know best.
Imagine you’re training an LLM. You need to label the dataset, feed data, wait for the model to run, evaluate the loss, adjust the model, and repeat. Without automation, Continue reading
Imagine you have an IPv4-only network1 and want to try out how to deploy a routing protocol for IPv6. netlab is a pretty good tool for the job as it:
Kubernetes has transformed how teams build and scale applications, but it has also introduced new layers of complexity. Platform and DevOps teams must now integrate and manage multiple technologies: CNI, ingress and egress gateways, service mesh, and more across increasingly large and dynamic environments. As more applications are deployed into Kubernetes clusters, the operational burden on these teams continues to grow, especially when maintaining performance, reliability, security, and observability across diverse workloads.
To address this complexity and tool sprawl, Tigera is incorporating Istio’s Ambient Service Mesh directly into the Calico Unified Network Security Platform. Service mesh has become the preferred solution for application-level networking, particularly in environments with a large number of services or highly regulated workloads. Among available service meshes, Istio stands out as the most popular and widely adopted, supported by a thriving open-source community. By leveraging the lightweight, sidecarless design of Istio Ambient Mode, Calico delivers all the benefits of service mesh, secure service-to-service communication, mTLS authentication, fine-grained authorization, traffic management, and observability, without the burden of sidecars.
Complementing this addition is Calico AI. Calico AI brings intelligence and automation to Kubernetes networking. It addresses the massive operational burden on teams Continue reading
Been a while since I did a “War Stories” post - here’s one about a routing policy I screwed up recently. Gave me a fright that I’d really messed something up, but in the end it was no big deal, and it taught me something about who uses route collector info.
While looking at bgp.he.net/AS32590 for something unrelated, I saw this:
Investigating more, it tells me this:
What the hell is going on? We should never be announcing bogon ranges to any peer. I rushed off to check some of our peering sessions, e.g
1
2
3
4
5
6
7
8
lindsayh@rtr> show route advertising-protocol bgp 86.104.125.69
inet.0: 1009955 destinations, 8974886 routes (1008431 active, 2 holddown, 2770 hidden)
Prefix Nexthop MED Lclpref AS path
* 155.133.226.0/24 Self I
* 155.133.229.0/24 Self I
* 155.133.250.0/24 Self I
* 162.254.197.0/24 Self I
We’re just advertising the normal set of prefixes I expect at that site. Defintely not advertising anything unusual to HE. So why do they think we’re advertising bogons?
Hmmm…Cloudflare Radar also says we’re announcing junk. Must Continue reading
Do planned economies, like China, have an advantage in deploying IPv6? What do the numbers on the DFZ show? George Michaelson joins Russ and Tom to discuss.
When a customer wants to bring IP address space to Cloudflare, they’ve always had to reach out to their account team to put in a request. This request would then be sent to various Cloudflare engineering teams such as addressing and network engineering — and then the team responsible for the particular service they wanted to use the prefix with (e.g., CDN, Magic Transit, Spectrum, Egress). In addition, they had to work with their own legal teams and potentially another organization if they did not have primary ownership of an IP prefix in order to get a Letter of Agency (LOA) issued through hoops of approvals. This process is complex, manual, and time-consuming for all parties involved — sometimes taking up to 4–6 weeks depending on various approvals.
Well, no longer! Today, we are pleased to announce the launch of our self-serve BYOIP API, which enables our customers to onboard and set up their BYOIP prefixes themselves.
With self-serve, we handle the bureaucracy for you. We have automated this process using the gold standard for routing security — the Resource Public Key Infrastructure, RPKI. All the while, we continue to ensure the best quality of service by Continue reading
Like any other routing protocol, IS-IS has several timers you can tweak to improve the convergence speed of your network, or make your network unstable (eventually breaking it completely) if you reduce them too much (if you care about fast convergence, you REALLY SHOULD use BFD).
You’ll find more details (and the opportunity to tweak the timers in a safe environment) in the Adjust IS-IS Timers lab exercise.
Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to feature/6-timers and execute netlab up.
As organizations scale Kubernetes and hybrid infrastructures, many are realizing that more tools don’t mean better security. A recent Microsoft report found that organizations with 16+ point solutions see 2.8x more data security incidents than those with fewer tools. Yet platform teams are still expected to deliver resilience and performance across containers, VMs, and bare metal, often while juggling fragmented tools that introduce risk, downtime, and complexity.
The Fall 2025 release of Calico Enterprise and Calico Cloud cuts through that complexity. Its new features are designed to make your infrastructure more resilient, performant, and observable—right out of the box. From disaster recovery automation to modern data plane support and application traffic handling, these updates empower platform engineers to simplify operations while meeting strict reliability requirements.
1. Resilient, High-Performance Networking and Improved Quality of Service: