Networking Archives - NetworkingNexus.net

How to Deploy Whisker and Goldmane in Manifest Only Calico Setups

Your Step-by-Step to Observability Without the Operator

If you’re running Calico using manifests, you may have found that enabling the observability features introduced in version 3.30, including Whisker and Goldmane, requires a more hands-on approach. Earlier documentation focused on the Tigera operator, which automates key tasks such as certificate management and secure service configuration. In a manifest-based setup, these responsibilities shift to the user. While the process involves more manual steps, it provides greater transparency and control over each component. With the right guidance, setting up these observability tools is entirely achievable and offers valuable insight into the internal behavior of your Calico deployment.

We’ve heard from many of you in the Calico Slack community: you’re eager to try out Whisker and Goldmane but aren’t sure how to set them up without Helm or the operator. For anyone who’s up for a challenge, this blog post provides a step-by-step guide on how to get everything wired up the hard way.
However, even if you already use the operator, keep reading! We’re going to pull back the curtain on the magic it performs behind the scenes. Understanding these mechanics will help you troubleshoot, customize, and better appreciate a managed Continue reading

Leaner, More Efficient Storage Infrastructure for the AI Era

The AI era demands a simple infrastructure strategy that prioritizes scalability, performance and cost efficiency in managing AI data pipelines. A key challenge is supporting large language model (LLM) training, which requires massive data, compute and storage resources. Efficient training relies on the continuous feeding of large data sets and the storage of model parameters, intermediate results and checkpoints. Above all, the infrastructure strategy must ensure that the AI resources are scalable, reliable and cost-efficient. Scaling AI Training Infrastructure As models grow, so do the demands on high-performance block storage system with multiattach capabilities. The block Continue reading

How we found a bug in Go’s arm64 compiler

Every second, 84 million HTTP requests are hitting Cloudflare across our fleet of data centers in 330 cities. It means that even the rarest of bugs can show up frequently. In fact, it was our scale that recently led us to discover a bug in Go's arm64 compiler which causes a race condition in the generated code.

This post breaks down how we first encountered the bug, investigated it, and ultimately drove to the root cause.

Investigating a strange panic

We run a service in our network which configures the kernel to handle traffic for some products like Magic Transit and Magic WAN. Our monitoring watches this closely, and it started to observe very sporadic panics on arm64 machines.

We first saw one with a fatal error stating that traceback did not unwind completely. That error suggests that invariants were violated when traversing the stack, likely because of stack corruption. After a brief investigation we decided that it was probably rare stack memory corruption. This was a largely idle control plane service where unplanned restarts have negligible impact, and so we felt that following up was not a priority unless it kept happening.

And then it kept happening.

Coredumps Continue reading

Ultra Ethernet: Fabric Object – What it is and How it is created

Fabric Object

Fabric Object Overview

In libfabric, a fabric represents a logical network domain, a group of hardware and software resources that can communicate with each other through a shared network. All network ports that can exchange traffic belong to the same fabric domain. In practice, a fabric corresponds to one interconnected network, such as an Ethernet or Ultra Ethernet Transport (UET) fabric.

A good way to think about a fabric is to compare it to a Virtual Data Center (VDC) in a cloud environment. Just as a VDC groups together compute, storage, and networking resources into an isolated logical unit, a libfabric fabric groups together network interfaces, addresses, and transport resources that belong to the same communication context. Multiple fabrics can exist on the same system, just like multiple VDCs can operate independently within one cloud infrastructure.

The fabric object acts as the top-level context for all communication. Before an application can create domains, endpoints, or memory regions, it must first open a fabric using the fi_fabric() call. This creates the foundation for all other libfabric objects.

Each fabric is associated with a specific provider, for example, libfabric-uet, which defines how the fabric interacts with the underlying hardware and Continue reading

Ultra Etherent: Discovery

Updated 8-Ocotber 2025

Creating the fi_info Structure

Before the application can discover what communication services are available, it first needs a way to describe what it is looking for. This description is built using a structure called fi_info. The fi_info structure acts like a container that holds the application’s initial requirements, such as desired endpoint type or capabilities.

The first step is to reserve memory for this structure in the system’s main memory. The fi_allocinfo() helper function does this for the application. When called, fi_allocinfo() allocates space for a new fi_info structure, which this book refers to as the pre-fi_info, that will later be passed to the libfabric core for matching against available providers.

At this stage, most of the fields inside the pre-fi_info structure are left at their default values. The application typically sets only the most relevant parameters that express what it needs, such as the desired endpoint type or provider name, and leaves the rest for the provider to fill in later.

In addition to the main fi_info structure, the helper function also allocates memory for a set of sub-structures. These describe different parts of the communication stack, including the fabric, domain, and endpoints. The fixed sub-structures Continue reading

SysLinuxOS: The Go-To Linux for System Administrators

Today’s Linux distributions are plentiful and run the gamut of purposes. There are Linux distributions for those who are new to the open source OS, for gaming, developing, content creation, multimedia, containers, Internet of Things (IoT), edge, routers, firewalls, refrigerators … the list goes on and on. And, of course, there are Linux distributions that are purpose-built for those in IT, such as its site, “SysLinuxOS was built to work right out of the box, with all networking tools already installed by default. There is no need to install anything; it is a Swiss army knife to always carry with us. There are all the major Virtual Private Networks (VPN), several remote control clients, various browsers, as well as Wine, Wireshark, Etherape, Ettercap, PackETH, Packetsender, Putty, Nmap, Packet Tracer 8.2.2, Virtualbox 7.2, Munin, Zabbix-agent2, Icinga, Monit, Nagios4, and tools for serial console and the latest stable liquorix kernel.” At first blush, SysLinuxOS seems to be similar to Tails, only instead of it being targeted at pentesters, it’s more for administrators who need Continue reading

How to Take Packet Captures in ContainerLab/Netlab?

If you follow my blog, you probably know that I’m a big advocate for using Containerlab and Netlab to spin up network labs. I’ve already covered both tools in detail, so I won’t go over the basics again here. You can check the links below if you’re new to them or want a quick refresher. In this post, we’ll look at how to take packet captures in Containerlab labs. So, let’s get started.

How do I run Containerlab?

I know everyone has their own way of running Containerlab, so I thought I’d share how I set up and run my labs. My daily driver is a MacBook, but I run Containerlab on a server that’s set up as Continue reading

SUSE and Tigera: Empowering Secure, Scalable Kubernetes with Calico Enterprise

Modern Workloads Demand Modern Kubernetes Infrastructure

As organizations expand Kubernetes adoption—modernizing legacy applications on VMs and bare metal, running next-generation AI workloads, and deploying intelligence at the edge—the demand for infrastructure that is scalable, flexible, resilient, secure, and performant has never been greater. At the same time, compliance, consistent visibility, and efficient management without overburdening teams remain critical.

The combination of Calico Enterprise from Tigera and SUSE Rancher Prime delivers a resilient and scalable platform that combines high-performance networking, robust network security, and operational simplicity in one stack.

Comprehensive Security Without Compromise

Calico Enterprise provides a unified platform for Kubernetes networking, security, and observability:

eBPF-powered networking for high performance without sidecar overhead
One platform for all Kubernetes traffic: ingress, egress, in-cluster, and multi-cluster
Security for every workload type: containers, VMs, and bare metal
Seamless scaling with built-in multi-cluster networking and security
Zero-trust security with identity-aware policies and workload-based microsegmentation
Integrated observability for policy enforcement and troubleshooting
Compliance features that simplify audits (PCI-DSS, HIPAA, SOC 2, FedRAMP)

Deployed with Rancher Prime, these capabilities extend directly into every cluster, enabling security-conscious industries such as finance, healthcare, and government to confidently run Kubernetes for any use case—from application modernization to AI and edge Continue reading

How to Connect Nested KubeVirt Clusters with Calico and BGP Peering

Running Kubernetes inside Kubernetes isn’t just a fun experiment anymore – it’s becoming a key pattern for delivering multi-environment platforms at scale. With KubeVirt, a virtualization add-on for Kubernetes that uses QEMU (an open-source machine emulator and virtualizer), you can run full-featured Kubernetes clusters as virtual machines (VMs) inside a parent Kubernetes cluster. This nested architecture makes it possible to unify containerized and virtualized workloads, and opens the door to new platform engineering use cases.

But here’s the challenge: how can you ensure that these nested clusters, and the workloads within, can reach, and be reached by, your physical network and are treated the same way as any other cluster?

That’s where Calico’s Advanced BGP (Border Gateway Protocol) peering with workloads comes into play. By enabling BGP route exchange between the parent cluster and nested KubeVirt VMs, Calico extends dynamic routing directly to virtualized workloads. This allows nested clusters to participate in the broader network topology and advertise their pod and service IPs just like any other node. Thus eliminating the need for tunnels or overlays to achieve true layer 3 connectivity.

In this blog, we’ll walk through the big picture, prerequisites, and step-by-step configuration for setting up BGP Continue reading

Hedge 282: Future of Operations

On this episode of the Hedge, Anil Varanasi joins Russ to talk about the complexities of network operations and what Meter is doing in this space.

note: even though this is a more product-heavy episode of the Hedge than usual, it is not sponsored

download

Containerlab – The Anti-Pattern

An intellectual is a man who says a simple thing in a difficult way; an artist is a man who says a difficult thing in a simple way. Charles Bukowski - Notes of a Dirty Old Man, 1969 In a world where technology seems to be working against us and seemingly simple things require far READ MORE

The post Containerlab – The Anti-Pattern appeared first on The Gratuitous Arp.

LIU001: Growing Pains

Starting any new endeavor is hard. That’s particularly true for a career in tech. And that’s the reason Alexis Bertholf and Kevin Nanns are launching the Life In Uptime podcast. In each episode they’ll sit down with engineers, leaders, and builders in tech to uncover the stories behind their careers to help you see how... Read more »

netlab: Applying Simple Configuration Changes

For years, netlab has had custom configuration templates that can be used to deploy custom configurations onto lab devices. The custom configuration templates can be Jinja2 templates, and you can create different templates (for the same functionality) for different platforms. However, using that functionality if you need an extra command or two makes approximately as much sense as using a Kubernetes cluster to deploy a BusyBox container.

netlab release 25.09 solves that problem with the files plugin and the inline config functionality.

Ultra Ethernet: Address Vector (AV)

Address Vector (AV)

The Address Vector (AV) is a provider-managed mapping that connects remote fabric addresses to compact integer handles (fi_addr_t) used in communication operations. Unlike a routing table, the AV does not store IP (device mappings). Instead, it converts an opaque Fabric Address (FA)—which may contain IP, port, and transport-specific identifiers—into a simple handle that endpoints can use for sending and receiving messages. The application never needs to reference the raw IP addresses directly.

Phase 1: Application – Request & Definition

The application begins by requesting an Address Vector (AV) through the fi_av_open() call. To do this, it first defines the desired AV properties in a fi_av_attr structure:

int fi_av_open(struct fid_domain *domain, struct fi_av_attr *attr, struct fid_av **av, void *context);

struct fi_av_attr av_attr = {

.type = FI_AV_TABLE,

.count = 16,

.rx_ctx_bits = 0,

.ep_per_node = 1,

.name = "my_av",

.map_addr = NULL,

.flags = 0

};

Example 4-1: structure Continue reading

From Classroom to Community 2 – ROT13 and Math Quizzer

I am so happy the first post about sharing my Python scripts got such a […]

The post From Classroom to Community 2 - ROT13 and Math Quizzer first appeared on Brezular's Blog.

TCG059: From Source of Truth to Knowledge Graph – Rethinking Network Data

Network automation has a data problem. Traditional tools may hit limitations when managing complex infrastructure relationships. We explore how OpsMill’s InfraHub uses graph databases and temporal versioning to create what our guest calls “the knowledge graph of infrastructure” – enabling true version control at the database level while maintaining the flexibility to model anything from... Read more »

NAN102: Practical Applications for AI in Network Automation

As AI becomes more integrated into the IT landscape, developers, engineers, and operators are looking for practical ways to use these new tools. Joining us today is Ryan Booth; he’s built a career around network automation, giving him a unique perspective on how network engineering, operations, software development, and AI intersect. We explore the practical... Read more »

EVPN Designs: Multi-Pod Fabrics

In the EVPN Designs: Layer-3 Inter-AS Option A, I described the simplest multi-site design in which the WAN edge routers exchange IP routes in individual VRFs, resulting in two isolated layer-2 fabrics connected with a layer-3 link.

Today, let’s explore a design that will excite the True Believers in end-to-end layer-2 networks: two EVPN fabrics connected with an EBGP session to form a unified, larger EVPN fabric. We’ll use the same “physical” topology as the previous example; the only modification is that the WA-WB link is now part of the underlay IP network.

PP080: The State of OT Risks in 2025 (and What to Do About Them)

What does the risk environment for Operational Technology (OT) look like in 2025? JJ and Drew review four recent reports on the state of OT security from Dragos, Fortinet, and others. We discuss ransomware impacts, ongoing risks of RDP traffic, directly exposed OT devices, and overall attack trends and the tools and processes that organizations... Read more »

Payload on Workers: a full-fledged CMS, running entirely on Cloudflare’s stack

Tucked behind the administrator login screen of countless websites is one of the Internet’s unsung heroes: the Content Management System (CMS). This seemingly basic piece of software is used to draft and publish blog posts, organize media assets, manage user profiles, and perform countless other tasks across a dizzying array of use cases. One standout in this category is a vibrant open-source project called Payload, which has over 35,000 stars on GitHub and has generated so much community excitement that it was recently acquired by Figma.

Today we’re excited to showcase a new template from the Payload team, which makes it possible to deploy a full-fledged CMS to Cloudflare’s platform in a single click: just click the Deploy to Cloudflare button to generate a fully-configured Payload instance, complete with bindings to Cloudflare D1 and R2. Below we’ll dig into the technical work that enables this, some of the opportunities it unlocks, and how we’re using Payload to help power Cloudflare TV. But first, a look at why hosting a CMS on Workers is such a game changer.

^{Behind the scenes: Cloudflare TV’s Payload instance}

Serverless by design

Most CMSs are designed to be hosted on a conventional server that runs Continue reading

1 2 3 … 3,464 Next »