Archive

Category Archives for "Networking"

HS116: Nth-Party Risk May Put You on the (Block) Chain Gang

The evolution of the modern, Internet-driven economy has created the conditions for essentially unbounded Nth-party risks (that is, risks from your suppliers, and risks from your suppliers’ suppliers, and risks from your suppliers’ suppliers’ suppliers, ad infinitum). Nth party risks exist in public clouds, SaaS, software and hardware supply chains, and now in the form... Read more »

One-Arm Hub-and-Spoke VPN on Arista EOS

In September 2024, I described how you can build One-Arm Hub-and-Spoke VPN with MPLS/VPN. In that blog post, I mentioned that the solution doesn’t work on Arista EOS because it allocates MPLS labels to whole VRFs (per-VRF label allocation).

In early September, I received an email from Daniel Blažek telling me that Arista fixed this particular annoyance in the EOS release 4.34.2F. It still uses per-VRF label allocation, but now, you can assign a different label to the default route. Let’s see how that works with our one-arm hub-and-spoke topology:

UET Data Transport Part I: Introduction

[Figure updated 13 November 2025]

My previous UET posts explained how an application uses libfabric function API calls to discover available hardware resources and how this information is used to create a hardware abstraction layer composed of Fabric, Domain, and Endpoint objects, along with their child objects — Event Queues, Completion Queues, Completion Counters, Address Vectors, and Memory Regions.

This chapter explains how these objects are used during data transfer operations. It also describes how information is encoded into UET protocol headers, including the Semantic Sublayer (SES) and Packet Delivery Sublayer (PDC). In addition, the chapter covers how the Congestion Management Sublayer (CMS) monitors and controls send queue rates to prevent egress buffer overflows.

Note: In this book, libfabric API calls are divided into two categories for clarity. Functions are used to create and configure fabric objects such as fabrics, domains, endpoints, and memory regions (for example, fi_fabric(), fi_domain(), and fi_mr_reg()). Operations, on the other hand, perform actual data transfer or synchronization between processes (for example, fi_write(), fi_read(), and fi_send()).

Figure 5-1 provides a high-level overview of a libfabric Remote Memory Access (RMA) operation using the fi_write function call. When an application needs to transfer data, such as gradients, from Continue reading

A closer look at Python Workflows, now in beta

Developers can already use Cloudflare Workflows to build long-running, multi-step applications on Workers. Now, Python Workflows are here, meaning you can use your language of choice to orchestrate multi-step applications.

With Workflows, you can automate a sequence of idempotent steps in your application with built-in error handling and retry behavior. But Workflows were originally supported only in TypeScript. Since Python is the de facto language of choice for data pipelines, artificial intelligence/machine learning, and task automation – all of which heavily rely on orchestration – this created friction for many developers.

Over the years, we’ve been giving developers the tools to build these applications in Python, on Cloudflare. In 2020, we brought Python to Workers via Transcrypt before directly integrating Python into workerd in 2024. Earlier this year, we built support for CPython along with any packages built in Pyodide, like matplotlib and pandas, in Workers. Now, Python Workflows are supported as well, so developers can create robust applications using the language they know best.

Why Python for Workflows?

Imagine you’re training an LLM. You need to label the dataset, feed data, wait for the model to run, evaluate the loss, adjust the model, and repeat. Without automation, Continue reading

Tailscale Welcomes Kubernetes Co-Founder Joe Beda as Advisor

Virtual Private Network (VPN) software provider Tailscale has brought on Kubernetes pioneer Brendan Burns (currently at Microsoft) and open source VPN software WireGuard, which provides an easy way to remotely connect to a network by way of VPN protocols. The company has parlayed the open source success of the code into an enterprise platform for running networks as well. Now, in an effort to expand its reach, Tailscale is looking to break into the cloud native Kubernetes market. The company has kicked off a number of initiatives to support Kubernetes networking in a production-scale facility. “Kubernetes networking has always been a bit of a challenge,” largely owing to its immense flexibility and ability to work in so many different environments, Beda said in an interview with TNS. Setting up the networking for a single cluster is easy enough, he said. But as the Continue reading

netlab: Test IPv6 IGP Deployment

Imagine you have an IPv4-only network1 and want to try out how to deploy a routing protocol for IPv6. netlab is a pretty good tool for the job as it:

  • Creates an addressing scheme for you
  • Designs a routing protocol deployment (OSPF, IS-IS) based on just a few bits of information
  • Deploys ready-to-run router configurations to a virtual lab.

Introducing Calico AI and Istio Ambient Mode

The Complexity of Modern Kubernetes Networking

Kubernetes has transformed how teams build and scale applications, but it has also introduced new layers of complexity. Platform and DevOps teams must now integrate and manage multiple technologies: CNI, ingress and egress gateways, service mesh, and more across increasingly large and dynamic environments. As more applications are deployed into Kubernetes clusters, the operational burden on these teams continues to grow, especially when maintaining performance, reliability, security, and observability across diverse workloads.

To address this complexity and tool sprawl, Tigera is incorporating Istio’s Ambient Service Mesh directly into the Calico Unified Network Security Platform. Service mesh has become the preferred solution for application-level networking, particularly in environments with a large number of services or highly regulated workloads. Among available service meshes, Istio stands out as the most popular and widely adopted, supported by a thriving open-source community. By leveraging the lightweight, sidecarless design of Istio Ambient Mode, Calico delivers all the benefits of service mesh, secure service-to-service communication, mTLS authentication, fine-grained authorization, traffic management, and observability, without the burden of sidecars.

Complementing this addition is Calico AI. Calico AI brings intelligence and automation to Kubernetes networking. It addresses the massive operational burden on teams Continue reading

Advertising Bogons (Or Was I?)

Been a while since I did a “War Stories” post - here’s one about a routing policy I screwed up recently. Gave me a fright that I’d really messed something up, but in the end it was no big deal, and it taught me something about who uses route collector info.

Uh-oh…we’re announcing bogons?

While looking at bgp.he.net/AS32590 for something unrelated, I saw this:

announcing bogons

Investigating more, it tells me this:

list of bogons

What the hell is going on? We should never be announcing bogon ranges to any peer. I rushed off to check some of our peering sessions, e.g

1
2
3
4
5
6
7
8
lindsayh@rtr> show route advertising-protocol bgp 86.104.125.69

inet.0: 1009955 destinations, 8974886 routes (1008431 active, 2 holddown, 2770 hidden)
  Prefix		  Nexthop	       MED     Lclpref    AS path
* 155.133.226.0/24        Self                                    I
* 155.133.229.0/24        Self                                    I
* 155.133.250.0/24        Self                                    I
* 162.254.197.0/24        Self                                    I

We’re just advertising the normal set of prefixes I expect at that site. Defintely not advertising anything unusual to HE. So why do they think we’re advertising bogons?

Hmmm…Cloudflare Radar also says we’re announcing junk. Must Continue reading

Internet Evolution

This article is based on a presentation I made to the ARIN 56 meeting in October 2025. Here I'd like to elevate the typical Regional Internet Registry policy conversations above the day-to-day mundanities of address allocation policies with its vocabulary of address block sizes and needs-based justifications, fairness and efficiency and look more broadly at the context of the industry we operate in, and try to gain an understanding of where we are right now, and speculate on where it's all going.

Monitor Docker Containers Across Servers With Beszel

How many machines do you have on your network that run Docker containers? One? Two? 20? Now, how are those machines and containers performing? How quickly can you log into those machines and run the necessary commands to suss out that information? Even better, do you know the commands required to do this? What if I told you you could deploy a container on one machine and then deploy agents on every server you need to monitor? And what if I told you this could all be done via Docker, and it’s really easy? The end result is a single dashboard that gives you quick access to resource usage for those machines used for your container deployments. That container is called

HN804: How Prisma SASE Builds on Public Clouds for Scale, Resiliency (Sponsored)

How do you architect a Secure Access Service Edge (SASE) to provide critical security services to millions of endpoints distributed across the planet? How do you build such a service for scale, performance, and resiliency? One option is to build your own PoPs or use colocation facilities, run your own infrastructure stack, and connect everything... Read more »

DIY BYOIP: a new way to Bring Your Own IP prefixes to Cloudflare

When a customer wants to bring IP address space to Cloudflare, they’ve always had to reach out to their account team to put in a request. This request would then be sent to various Cloudflare engineering teams such as addressing and network engineering — and then the team responsible for the particular service they wanted to use the prefix with (e.g., CDN, Magic Transit, Spectrum, Egress). In addition, they had to work with their own legal teams and potentially another organization if they did not have primary ownership of an IP prefix in order to get a Letter of Agency (LOA) issued through hoops of approvals. This process is complex, manual, and  time-consuming for all parties involved — sometimes taking up to 4–6 weeks depending on various approvals. 

Well, no longer! Today, we are pleased to announce the launch of our self-serve BYOIP API, which enables our customers to onboard and set up their BYOIP prefixes themselves.

With self-serve, we handle the bureaucracy for you. We have automated this process using the gold standard for routing security — the Resource Public Key Infrastructure, RPKI. All the while, we continue to ensure the best quality of service by Continue reading

Lab: Adjust IS-IS Timers

Like any other routing protocol, IS-IS has several timers you can tweak to improve the convergence speed of your network, or make your network unstable (eventually breaking it completely) if you reduce them too much (if you care about fast convergence, you REALLY SHOULD use BFD).

You’ll find more details (and the opportunity to tweak the timers in a safe environment) in the Adjust IS-IS Timers lab exercise.

Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to feature/6-timers and execute netlab up.

What’s New in Calico – Fall 2025 Release

Simplify, Secure, and Scale Your Infrastructure

As organizations scale Kubernetes and hybrid infrastructures, many are realizing that more tools don’t mean better security. A recent Microsoft report found that organizations with 16+ point solutions see 2.8x more data security incidents than those with fewer tools. Yet platform teams are still expected to deliver resilience and performance across containers, VMs, and bare metal, often while juggling fragmented tools that introduce risk, downtime, and complexity.

The Fall 2025 release of Calico Enterprise and Calico Cloud cuts through that complexity. Its new features are designed to make your infrastructure more resilient, performant, and observable—right out of the box. From disaster recovery automation to modern data plane support and application traffic handling, these updates empower platform engineers to simplify operations while meeting strict reliability requirements.

The new features in this release can be grouped into two main categories:

1. Resilient, High-Performance Networking and Improved Quality of Service:

IPB187: IPv6 RFC Updates

Today the IPv6 Buzz crew provides updates on the latest in IPv6 standards, RFCs, and best practices. They break down the recent discussions around RFC 6052, explore the options for RFC 8215, and share Nick’s spin on the now defunct testipv6.com site. Episode Links: RFC 6052 RFC 8215 RFC 6598 IPv6.army

N4N042: Meet MACsec

MACsec is a protocol for encrypting Ethernet frames on a local (though not always local) network. Ethan Banks and Holly Metlitzky have an ELI5 (explain like I’m 5) discussion as to what exactly is MACsec and how it differs from IPsec. They talk about when and whether you need to implement MACsec with all the... Read more »
1 5 6 7 8 9 3,476