The Curious Case of Default OSPF Interface Timers

We run two types of integration tests before shipping a netlab release: device integration tests that check whether we correctly implemented netlab features on all supported devices, and platform integration tests that check whether rarely-used core functionality works as expected.

I want to have some validation included in the platform integration tests to ensure the lab devices are started, and that the links and the management network work as expected. The simplest way to get that done is to start OSPF with short hello intervals (to get adjacency up in no time), for example:

Ultra Ethernet Transport

The Ultra Ethernet Consortium has a mission to Deliver an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale. The recently released UE-Specification-1.0.1 includes an Ultra Ethernet Transport (UET) protocol with similar functionality to RDMA over Converged Ethernet (RoCEv2).

The sFlow instrumentation embedded as a standard feature of data center switch hardware from all leading vendors (Arista, Cisco, Dell, Juniper, NVIDIA, etc.) provides a cost effective solution for gaining visibility into UET traffic in large production AI / ML fabrics. 

docker run -p 8008:8008 -p 6343:6343/udp sflow/prometheus
The easiest way to get started is to use the pre-built sflow/prometheus Docker image to analyze the sFlow telemetry. The chart at the top of this page shows an up to the second view of UET operations using the included Flow Browser application, see Defining Flows for a list of available UET attributes. Getting Started describes how to set up the sFlow monitoring system.

Flow metrics with Prometheus and Grafana describes how collect custom network traffic flow metrics using the Prometheus time series database and include the metrics in Grafana dashboards. Use the Flow Browser to explore Continue reading

Fresh insights from old data: corroborating reports of Turkmenistan IP unblocking and firewall testing

Here at Cloudflare, we frequently use and write about data in the present. But sometimes understanding the present begins with digging into the past.  

We recently learned of a 2024 turkmen.news article (available in Russian) that reports Turkmenistan experienced “an unprecedented easing in blocking,” causing over 3 billion previously-blocked IP addresses to become reachable. The same article reports that one of the reasons for unblocking IP addresses was that Turkmenistan may have been testing a new firewall. (The Turkmen government’s tight control over the country’s Internet access is well-documented.) 

Indeed, Cloudflare Radar shows a surge of requests coming from Turkmenistan around the same time, as we’ll show below. But we had an additional question: Does the firewall activity show up on Radar, as well? Two years ago, we launched the dashboard on Radar to give a window into the TCP connections to Cloudflare that close due to resets and timeouts. These stand out because they are considered ungraceful mechanisms to close TCP connections, according to the TCP specification. 

In this blog post, we go back in time to share what Cloudflare saw in connection resets and timeouts. We must remind our readers that, as passive observers, Continue reading

Ansible Release 12: the Windows Vista Moment

My first encounter with Ansible release 12 wasn’t exactly encouraging. We were using a few Ansible Jinja2 filters (ipaddr and hwaddr) in internal netlab templates, and all of a sudden those templates started crashing due to some weird behavior of attributes starting with underscore.

We implemented don’t use Ansible release 12 as a quick workaround, but postponing painful things is never a good solution(see also: visiting a dentist), so I decided to try to make netlab work with Ansible release 12. What a mistake to make.

Build BGP Labs in Minutes, Not Hours with Netlab

Build BGP Labs in Minutes, Not Hours with Netlab

What if I told you that all it takes to build a simple BGP lab with two eBGP peers (or even a hundred, for that matter) is a single YAML file? No need to add nodes on a GUI, connect links, or configure interface IPs manually. You just define the lab in a YAML file as shown below, and in about two minutes, you’ll have two routers of your choice fully configured with BGP and an established eBGP session.

provider: clab
defaults.device: eos
defaults.devices.eos.clab.image: ceos:4.34.2

addressing:
  mgmt:
    ipv4: 192.168.200.0/24

nodes:
  - name: r1
    module: [ bgp ]
  - name: r2
    module: [ bgp ]

bgp:
  as_list:
    100:
      members: [ r1]
    200:
      members: [ r2 ]

links:
  - r1-r2
r1#show ip bgp summary 
BGP summary information for VRF default
Router identifier 10.0.0.1, local AS number 100
Neighbor Status Codes: m - Under maintenance
  Description              Neighbor V AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc PfxAdv
  r2                       10.1.0.2 4 200                5         5    0    0 00:00:15 Estab   1      1      1
r2#show ip bgp summary 
BGP summary information for VRF default
Router identifier 10.0.0.2,  Continue reading

Hedge 286: Roundtable

It’s time again for Tom, Eyvonne, and Russ to talk about current articles they’ve run across in their day-to-day reading. This time we talk about WiFi in the home, how often users think a global problem is really local, and why providers have a hard time supporting individual homes and businesses. The second topic is one no one really cares about … apathy. What causes apathy? How can we combat it? Join us for this episode of the Hedge … if you can bring yourself to care!
 

 
download

Technology Short Take 189

Welcome to Technology Short Take #189, Halloween Edition! OK, you caught me—this Tech Short Take is not scary. I’ll try harder next year. In the meantime, enjoy this collection of links about data center-related technologies. Although this installation is lighter on content than I would prefer, I am publishing anyway in the hopes of trying to get back to a somewhat-regular cadence. Here’s hoping you find something useful and informative!

Networking

Servers/Hardware

Security

  • Security researchers recently published some research on a new microarchitectural exploit called “VMScape.” The TL;DR on VMScape is that it allows hypervisor information to leak from a malicious VM. Oops! Olivier Lambert has a write-up that explains why the Xen hypervisor is not affected by this exploit. (Side note: be sure to read the comments—Olivier shares some useful information there.)
  • The leaking of source code for F5 appliances by a “nation-state affiliated cyber threat actor” has lead the CISA Continue reading

Calico Whisker in Action: Reading and Understanding Policy Traces

Kubernetes adoption is growing, and managing secure and efficient network communication is becoming increasingly complex. With this growth, organizations need to enforce network policies with greater precision and care. However, implementing these policies without disrupting operations can be challenging.

That’s where Calico Whisker comes in. It helps teams implement network policies that follow the principle of least privilege, ensuring workloads communicate only as intended. Since most organizations introduce network policies after applications are already running, safe and incremental rollout is essential.

To support this, Calico Whisker offers staged network policies, which allow teams to preview a policy’s effect in a live environment before enforcing it. Alongside this, policy traces in Calico Whisker provide deep visibility into how both enforced and pending policies impact traffic. This makes it easier to understand policy behaviour, validate intent, and troubleshoot issues in real time. In this post, we’ll walk through real-world policy trace outputs and show how they help teams confidently deploy and refine network policies in production Kubernetes clusters.

Kubernetes Network Policy Behaviour

It’s important to reiterate the network policy behaviour in Kubernetes, as understanding this foundation is key to effectively interpreting policy traces and ensuring the right traffic flow decisions are Continue reading

Vector Packet Processor (VPP) dropped packet notifications


Vector Packet Processor (VPP) release 25.10 extends the sFlow implementation to include support for dropped packet notifications, providing detailed, low overhead, visibility into traffic flowing through a VPP router, see Vector Packet Processor (VPP) for performance information.
sflow sampling-rate 10000
sflow polling-interval 20
sflow header-bytes 128
sflow direction both
sflow drop-monitoring enable
sflow enable GigabitEthernet0/8/0
sflow enable GigabitEthernet0/9/0
sflow enable GigabitEthernet0/a/0
The above VPP configuration commands enable sFlow monitoring of the VPP dataplane, randomly sampling packets, periodically polling counters, and capturing dropped packets and reason codes. The measurements are send via Linux netlink messages to an instance of the open source Host sFlow agent (hsflowd) which combines the measurements and streams standard sFlow telemetry to a remote collector.
sflow {
  collector { ip=192.0.2.1 udpport=6343 }
  psample { group=1 egress=on }
  dropmon { start=on limit=50 }
  vpp { }
}
The /etc/hsflowd.conf file above enables the modules needed to receive netlink messages from VPP and send the resulting sFlow telemetry to a collector at 192.0.2.1. See vpp-sflow for detailed instructions.
docker run -p 6343:6343/udp sflow/sflowtool
Run sflowtool on the sFlow collector host to verify verify that the data is being received and Continue reading

TCG061: How Are You Using AI?

Join William Collins and Evyonne Sharp as they catch up on all things AI. They discuss the AI bubble and how it relates to venture capital, stock, and company evaluations. They talk about the AI experience for the average person, the adoption rate of AI tools, and how the AI infrastructure buildout might affect the... Read more »

AI Is Just A Majordomo

The IT world is on fire right now with solutions to every major problem we’ve ever had. Wouldn’t you know it that the solution appears to be something that people are very intent on selling to you? Where have I heard that before? You wouldn’t know it looking at the landscape of IT right now but AI has iterated more times than you can think over the last couple of years. While people are still carrying on about LLMs and writing homework essays the market has moved on to agentic solutions that act like employees doing things all over the place.

The result is people are more excited about the potential for AI than ever. Well, that is if you’re someone that has problems that need to be solved. If you’re someone doing something creative, like making art or music or poetry you’re worried about what AI is going to do to your profession. That divide is what I’ve been thinking about for a while. I don’t think it should come as a shock to anyone but I’ve figured out why AI is hot for every executive out there.

AI appeals to people that have someone doing work for them.

Continue reading