Archive

Category Archives for "Networking"

Understanding Shortest Path First

Link state protocols like OSPF and IS-IS use the Shortest Path First (SPF) algorithm developed by Edsger Dijkstra. Edsger was a Dutch computer scientist, programmer, software engineer, mathematician, and science essayist. He wanted to solve the problem of finding the shortest distance between two cities such as Rotterdam and Groningen. The solution came to him when sitting in a café and the rest is history. SPF is used in many applications, such as GPS, but also in routing protocols, which I’ll cover today.

To explain SPF, I’ll be working with the following topology:

Note that SPF only works with a weighted graph where we have positive weights. I’m using symmetrical costs, although you could have different costs in each direction. Before running SPF, we need to build our Link State Database (LSDB) and I’ll be using IS-IS in my lab for this purpose. Based on the topology above, we can build a table showing the cost between the nodes:

This triplet of information consists of originating node, neighbor node, and cost. It can also be represented as [R1, R2, 2], [R1, R3, 8], [R2, R1, 2], [R2, R3, 5], [R2, R4, 6], [R3, R1, 8], [R3, R2, 5], [R3, R4, Continue reading

Repost: On the Advantages of XML

Continuing the discussion started by my Breaking APIs or Data Models Is a Cardinal Sin and Screen Scraping in 2025 blog posts, Dr. Tony Przygienda left another thoughtful comment worth reposting as a publicly visible blog post:


Having read your newest rant around my rant ;-} I can attest that you hit the nail on the very head in basically all you say:

  • XML output big? yeah.
  • JSON squishy syntax? yeah.
  • SSH prioritization? You didn’t live it until you had a customer where a runaway python script generated 800+ XML netconf sessions pumping data ;-)

What is an EVPN Type 5 Route for (EVPN/VXLAN)

For EVPN/VXLAN, Type 5 routes are used for two purposes: Internally and Externally

Internally it’s used to communicate which VTEPs have a given subnet instantiated on it.

Here’s an example of the output of the command show ip bgp route-type ip-prefix ipv4 on an Arista cEOS spine running EVPN/VXLAN.

It’s showing you that 10.1.10.0/24 (VLAN 10/VNI 10010) is only available on leaf1 and leaf2 (10.1.255.1-2) and 10.1.20.0/24 (VLAN 20/VNI 10020) is only available on leaf3 and leaf4 (10.1.255.3-4). It’s eBGP so each leaf has its own ASN (you see in the path field). The next hop shows the VTEP IP (10.1.254.1-4). I checked on the spine as the spine receives all the EVPN routes from the leafs and propagates them as a route server. The spines don’t install any of these routes, they just propagates them.

       Network                Next Hop              Metric  LocPref Weight  Path

* > RD: 10.1.255.1:10000 ip-prefix 10.1.10.0/24
10.1.254.1 - 100 0 65101 i
* > RD: 10.1.255.2:10000 ip-prefix 10.1.10.0/24
10.1.254.2 - 100 0 65102 i
Continue reading

Companies Must Embrace Bespoke AI Designed for IT Workflows

Although LLMs have been readily available for the past few years, inroads into the IT sector have been minimal. We have seen successful generative AI (GenAI) model penetration into SaaS solutions and areas like help desks; however, successful GenAI integration into security software has been few and far between. Generally speaking, it is not easy to repurpose an LLM to work within a security domain. LLMs are optimized for natural language; they can’t immediately understand or process security elements such as flow packets, logs, alerts, and knowledge graphs. To build out effective genAI integration in the security sphere, it’s time to embrace bespoke, foundational AI for IT workflows. AI Model Efficiency The recent trend toward building out models more efficiently, as opposed to scaling at all costs, is a natural progression of GenAI tools in the enterprise space. Despite all the LLM hype, not every business problem requires an LLM solution. If you utilize LLMs within your infrastructure, it’s best to right-size them (distill them into smaller models that address specific business problems) while focusing on privacy, security, and explainability. By right-sizing your models, compute is kept to a minimum, which prevents costs from being passed on to your customers. Continue reading

Cloudflare named in 2025 Gartner® Magic Quadrant™ for Security Service Edge

For the third consecutive year, Gartner has named Cloudflare in the Gartner® Magic Quadrant™ for Security Service Edge (SSE) report. This analyst evaluation helps security and network leaders make informed choices about their long-term partners in digital transformation. We are excited to share that Cloudflare is one of only nine vendors recognized in this year’s report. You can read more about our position in the report here.

What’s more exciting is that we’re just getting started. Since 2018, starting with our Zero Trust Network Access (ZTNA) service Cloudflare Access, we’ve continued to push the boundaries of how quickly we can build and deliver a mature SSE platform. In that time, we’ve released multiple products each year, delivering hundreds of features across our platform. That’s not possible without our customers. Today, tens of thousands of customers have chosen to connect and protect their people, devices, applications, networks, and data with Cloudflare. They tell us our platform is faster and easier to deploy and provides a more consistent and reliable user experience, all on a more agile architecture for longer term modernization. We’ve made a commitment to those customers to continue to deliver innovative solutions with the velocity and resilience Continue reading

Resolving a request smuggling vulnerability in Pingora

On April 11, 2025 09:20 UTC, Cloudflare was notified via its Bug Bounty Program of a request smuggling vulnerability in the Pingora OSS framework discovered by a security researcher experimenting to find exploits using Cloudflare’s Content Delivery Network (CDN) free tier which serves some cached assets via Pingora.

Customers using the free tier of Cloudflare’s CDN or users of the caching functionality provided in the open source pingora-proxy and pingora-cache crates could have been exposed.  Cloudflare’s investigation revealed no evidence that the vulnerability was being exploited, and was able to mitigate the vulnerability by April 12, 2025 06:44 UTC within 22 hours after being notified.

What was the vulnerability?

The bug bounty report detailed that an attacker could potentially exploit an HTTP/1.1 request smuggling vulnerability on Cloudflare’s CDN service. The reporter noted that via this exploit, they were able to cause visitors to Cloudflare sites to make subsequent requests to their own server and observe which URLs the visitor was originally attempting to access.

We treat any potential request smuggling or caching issue with extreme urgency.  After our security team escalated the vulnerability, we began investigating immediately, took steps to disable traffic to vulnerable components, and deployed Continue reading

Response: True Unnumbered Interfaces

Hendrik left an interesting comment on my Running IS-IS over Unnumbered Ethernet Interfaces blog post:

FRRouting (Linux) with pure IS-IS, the only way it currently (10.3) works is to copy the loopback IPv4 address to the interfaces that you need to do IPv4 routing on. The OpenFabric (IS-IS “extension” draft) does support true unnumbered interfaces and routes IPv6.

Let’s unpack this. There are (at least) four reasons a router needs an address associated with an interface1:

What’s New in Calico: Spring 2025

Introducing Calico Cloud Free Tier

Calico provides a unified platform for all your Kubernetes networking, network security, and observability requirements. From ingress/egress management and east-west policy enforcement to multi-cluster connectivity, Calico delivers comprehensive capabilities. It is distribution-agnostic, preventing vendor lock-in and offering a consistent experience across popular Kubernetes distributions and managed services. Calico eliminates silos, providing seamless networking and observability for containers, VMs, and bare metal servers, and extends effortlessly to multi-cluster environments, in the cloud, on-premises, and at the edge.

With the recent release of Calico Open Source 3.30, we added:

  • Improved observability to visualize and troubleshoot workload communication with Calico Whisker and the Goldmane API.
  • Kubernetes Network Policies are critical for preventing ransomware, achieving microsegmentation to isolate sensitive assets for compliance, and thwarting attacks from malicious actors. However, implementing them effectively can be challenging due to the complexity of identifying, testing, and rapidly updating policies to meet evolving threats. Calico Open Source 3.30 introduces staged policies to enable teams to audit and validate policies before they are enforced, reducing the risk of misconfigured policies and improving security and compliance.
  • The ability to manage Kubernetes ingress traffic with Calico Ingress Gateway, a 100% upstream, enterprise-ready implementation Continue reading

🤖 AI Customer Support using an Agentic Framework

In this blog, I’ll walk you through the design, development, and lessons learned while building a multi-agent AI customer support assistant using the LangChain framework and related AI tools. 🎮💬 🎯 Motivation: Why Build This? At KGeN, a game aggregation platform connecting publishers and gamers, our primary users are gamers and clan chiefs (micro-community leaders). … Continue reading 🤖 AI Customer Support using an Agentic Framework

Bringing connections into view: real-time BGP route visibility on Cloudflare Radar

The Internet relies on the Border Gateway Protocol (BGP) to exchange IP address reachability information. This information outlines the path a sender or router can use to reach a specific destination. These paths, conveyed in BGP messages, are sequences of Autonomous System Numbers (ASNs), with each ASN representing an organization that operates its own segment of Internet infrastructure.

Throughout this blog post, we'll use the terms "BGP routes" or simply "routes" to refer to these paths. In essence, BGP functions by enabling autonomous systems to exchange routes to IP address blocks (“IP prefixes”), allowing different entities across the Internet to construct their routing tables.

When network operators debug reachability issues or assess a resource's global reach, BGP routes are often the first thing they examine. Therefore, it’s critical to have an up-to-date view of the routes toward the IP prefixes of interest. Some networks provide tools called "looking glasses" — public routing information services offering data directly from their own BGP routers. These allow external operators to examine routes from that specific network's perspective. Furthermore, services like bgp.tools, bgp.he.net, RouteViews, or the NLNOG RING looking glass offer aggregated, looking glass-like lookup capabilities, drawing Continue reading

Amazing Speed of Bug Fixes in Nokia SR Linux

A few weeks ago, I was criticising Nokia’s unnecessary changes to the SR Linux configuration data model, so it’s only fair that I also publish a counterexample:

  • On April 12th, SR Linux failed one of the netlab integration tests. We keep adding functionality to these tests as we discover edge cases we didn’t test before, so sometimes a device that passed the test before might fail the modified version.
  • I opened a netlab issue, believing it might be a configuration error on our part.
  • It quickly became evident that we’re dealing with an SR Linux bug, as the failure to apply routing policies was random.

I thought that was the end of the story and closed the issue, but then something truly amazing happened:

Performance measurements… and the people who love them

⚠️ WARNING ⚠️ This blog post contains graphic depictions of probability. Reader discretion is advised.

Measuring performance is tricky. You have to think about accuracy and precision. Are your sampling rates high enough? Could they be too high?? How much metadata does each recording need??? Even after all that, all you have is raw data. Eventually for all this raw performance information to be useful, it has to be aggregated and communicated. Whether it's in the form of a dashboard, customer report, or a paged alert, performance measurements are only useful if someone can see and understand them.

This post is a collection of things I've learned working on customer performance escalations within Cloudflare and analyzing existing tools (both internal and commercial) that we use when evaluating our own performance.  A lot of this information also comes from Gil Tene's talk, How NOT to Measure Latency. You should definitely watch that too (but maybe after reading this, so you don't spoil the ending). I was surprised by my own blind spots and which assumptions turned out to be wrong, even though they seemed "obviously true" at the start. I expect I am not alone in these regards. For that Continue reading

netlab 2.0: Use Custom Bridges on Multi-Access Links

netlab uses point-to-point links provided by the underlying virtualization software to implement links with two nodes and Linux bridges to implement links with more than two nodes connected to them. That’s usually OK if you don’t care about the bridge implementation details, but what if you’d like to use a bridge (or a layer-2 switch if you happen to be of marketing persuasion) you’re familiar with?

You could always implement a bridged segment with a set of links connecting edge nodes to a VLAN-capable device. For example, you could use the following topology to connect two Linux hosts through a bridge running Arista EOS:

NB527: AWS Releases AI Agent for VMware Migration; Cisco Bullish On Customer AI Spending

Take a Network Break! Guest co-host Tom Hollingsworth steps in for Johna Johnson. We start with Google patching a significant Chrome vulnerability and de-elevating Chrome running with admin rights when it launches on Windows. On the news front, we discuss a report, unconfirmed as of recording time, that Arista is acquiring VeloCloud, then discuss Broadcom... Read more »

Your IPs, your rules: enabling more efficient address space usage

IPv4 addresses have become a costly commodity, driven by their growing scarcity. With the original pool of 4.3 billion addresses long exhausted, organizations must now rely on the secondary market to acquire them. Over the years, prices have surged, often exceeding $30–$50 USD per address, with costs varying based on block size and demand. Given the scarcity, these prices are only going to rise, particularly for businesses that haven’t transitioned to IPv6. This rising cost and limited availability have made efficient IP address management more critical than ever. In response, we’ve evolved how we handle BYOIP (Bring Your Own IP) prefixes to give customers greater flexibility.

Historically, when customers onboarded a BYOIP prefix, they were required to assign it to a single service, binding all IP addresses within that prefix to one service before it was advertised. Once set, the prefix's destination was fixed — to direct traffic exclusively to that service. If a customer wanted to use a different service, they had to onboard a new prefix or go through the cumbersome process of offboarding and re-onboarding the existing one.

As a step towards addressing this limitation, we’ve introduced a new level of flexibility: customers can Continue reading

1 2 3 3,436