Here’s a cool feature every routing protocol should have: a flag that tells everyone a node is going down, giving them time to adjust their routing tables before disrupting traffic flow.
OSPF never had such a feature; common implementations set the cost of all interfaces to a very high value to emulate it. BGP got it (the Graceful BGP Session Shutdown) almost 30 years after it was created. IS-IS had the overload bit from day one, and it’s just what an IS-IS router needs to tell everyone else they should stop using it for transit traffic. You can try it out in the Drain Traffic Before Node Maintenance lab exercise.
Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to feature/5-drain and execute netlab up.

The first half of the Graph Algorithms in Networks webinar by Rachel Traylor is now available without a valid ipSpace.net account; it discusses algorithms dealing with trees, paths, and finding centers of graphs. Enjoy!
When deploying a Kubernetes cluster, a critical architectural decision is how pods on different nodes communicate. The choice of networking mode directly impacts performance, scalability, and operational overhead. Selecting the wrong mode for your environment can lead to persistent performance issues, troubleshooting complexity, and scalability bottlenecks.
The core problem is that pod IPs are virtual. The underlying physical or cloud network has no native awareness of how to route traffic to a pod’s IP address, like 10.244.1.5 It only knows how to route traffic between the nodes themselves. This gap is precisely what the Container Network Interface (CNI) must bridge.

The CNI employs two primary methods to solve this problem:
Daniel Dib wrote a nice article describing the history of the loopback interface1, triggering an inevitable mention of the role of a loopback interface in OSPF and related flood of ancient memories on my end.
Before going into the details, let’s get one fact straight: an OSPF router ID was always (at least from the days of OSPFv1 described in RFC 1133) just a 32-bit identifier, not an IPv4 address2. Straight from the RFC 1133:
Today, I’ll focus on another feature of the new files plugin – you can use it to embed any (hopefully small) file in a lab topology (configlets are just a special case in which the plugin creates the relative file path from the configlets dictionary data).
You could use this functionality to include configuration files for Linux containers, custom reports, or even plugins in the lab topology, and share a complete solution as a single file that can be downloaded from a GitHub repository.
We all know netops, NRE, and devops can increase productivity, increase Mean Time Between Mistakes (MTBM), and decrease MTTR–but how do we deploy and use these tools? We often think of the technical hurdles you face in their deployment, but most of the blockers are actually cultural. Chris Grundemann, Eyvonne, Russ, and Tom discuss the cultural issues with deploying netops on this episode of the Hedge.
download
Modern applications are not monoliths. They are complex, distributed systems where availability depends on multiple independent components working in harmony. A web server might be running, but if its connection to the database is down or the authentication service is unresponsive, the application as a whole is unhealthy. Relying on a single health check is like knowing the “check engine” light is not on, but not knowing that one of your tires has a puncture. It’s great your engine is going, but you’re probably not driving far.
As applications grow in complexity, so does the definition of "healthy." We've heard from customers, big and small, that they need to validate multiple services to consider an endpoint ready to receive traffic. For example, they may need to confirm that an underlying API gateway is healthy and that a specific ‘/login’ service is responsive before routing users there. Until now, this required building custom, synthetic services to aggregate these checks, adding operational overhead and another potential point of failure.
Today, we are introducing Monitor Groups for Cloudflare Load Balancing. This feature provides a new way to create sophisticated, multi-service health assessments directly on our platform. With Monitor Groups, you can bundle Continue reading
Sometimes you want to assign IPv4/IPv6 subnets to transit links in your network (for example, to identify interfaces in traceroute outputs), but don’t need to have those subnets in the IP routing tables throughout the whole network. Like OSPF, IS-IS has a nerd knob you can use to exclude transit subnets from the router PDUs.
Want to check how that feature works with your favorite device? Use the Hide Transit Subnets in IS-IS Networks lab exercise.
Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to feature/4-hide-transit and execute netlab up.
[updated 25-Oct, 2025 - (RIs in the figure)]
In distributed AI workloads, each process requires memory regions that are visible to the fabric for efficient data transfer. The Job framework or application typically allocates these buffers in GPU VRAM to maximize throughput and enable low-latency direct memory access. These buffers store model parameters, gradients, neuron outputs, and temporary workspace, such as intermediate activations or partial gradients during collective operations in forward and backward passes.
Once memory is allocated, it must be registered with the fabric domain using fi_mr_reg(). Registration informs the NIC that the memory is pinned and accessible for data transfers initiated by endpoints. The fabric library associates the buffer with a Memory Region handle (fid_mr) and internally generates a remote protection key (fi_mr_key), which uniquely identifies the memory region within the Job and domain context.
The local endpoint binds the fid_mr using fi_mr_bind() to define permitted operations, FI_REMOTE_WRITE in figure 4-10. This allows the NIC to access local memory efficiently and perform zero-copy operations.
The application retrieves the memory key using fi_mr_key(fid_mr) and constructs a Resource Index (RI) entry. The RI entry serves as Continue reading