Network Device Telemetry Protocols with Dinesh Dutt

Whenever I’m ranting about vendors changing their data models or APIs with every other release, there is inevitably a vendor engineer chiming in, saying, “Life would be so much better if the customers wouldn’t insist on doing screen scraping for the last 50 years.”

While some of that screen scraping is pure inertia, we sometimes have good reasons to do it rather than use protocols like NETCONF, gNMI, or protobufs. In Episode 205 of Software Gone Wild, I’m discussing some of those reasons and exploring the gap between vendor theory and reality with Dinesh Dutt, who is unlucky enough to have become the world’s foremost expert on crappy network telemetry.

IPv4 ECMP Works on Arista cEOS Release 4.35.2F

When I wrote about the anycast-ECMP-in-MPLS behavior in 2011, I had to use Cisco IOS to prove that ECMP worked, since Arista cEOS (running the Linux kernel for IP forwarding) didn’t install more than one equal-cost path into the Linux forwarding table.

Arista cEOS got better in the meantime; IPv4 ECMP works like a charm on cEOS release 4.35.02F. With the same lab topology I’d used in 2021, I was able to see the traffic spread across multiple nodes:

Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans

Today, Cloudflare is introducing a new suite of fraud prevention capabilities designed to stop account abuse before it starts. We've spent years empowering Cloudflare customers to protect their applications from automated attacks, but the threat landscape has evolved. The industrialization of hybrid automated-and-human abuse presents a complex security challenge to website owners. Consider, for instance, a single account that’s accessed from New York, London, and San Francisco in the same five minutes. The core question in this case is not “Is this automated?” but rather “Is this authentic?” 

Website owners need the tools to stop abuse on their website, no matter who it’s coming from.

During our Birthday Week in 2024, we gifted leaked credentials detection to all customers, including everyone on a Free plan. Since then, we've added account takeover detection IDs as part of our bot management solution to help identify bots attacking your login pages. 

Now, we’re combining these powerful tools with new ones. Disposable email check and email risk help you enforce security preferences for users who sign up with throwaway email addresses, a common tactic for fake account creation and promotion abuse, or whose emails are deemed risky based on email patterns Continue reading

Slashing agent token costs by 98% with RFC 9457-compliant error responses

AI agents are no longer experiments. They are production infrastructure, making billions of HTTP requests per day, navigating the web, calling APIs, and orchestrating complex workflows.

But when these agents hit an error, they still receive the same HTML error pages we built for browsers: hundreds of lines of markup, CSS, and copy designed for human eyes. Those pages give agents clues, not instructions, and waste time and tokens. That gap is the opportunity to give agents instructions, not obstacles.

Starting today, Cloudflare returns RFC 9457-compliant structured Markdown and JSON error payloads to AI agents, replacing heavyweight HTML pages with machine-readable instructions.

That means when an agent sends Accept: text/markdown, Accept: application/json, or Accept: application/problem+json and encounters a Cloudflare error, we return one semantic contract in a structured format instead of HTML. And it comes complete with actionable guidance. (This builds on our recent Markdown for Agents release.)

So instead of being told only "You were blocked," the agent will read: "You were rate-limited — wait 30 seconds and retry with exponential backoff." Instead of just "Access denied," the agent will be instructed: "This block is intentional: do not retry, contact the site owner."

Continue reading

AI Security for Apps is now generally available

Cloudflare’s AI Security for Apps detects and mitigates threats to AI-powered applications. Today, we're announcing that it is generally available.

We’re shipping with new capabilities like detection for custom topics, and we're making AI endpoint discovery free for every Cloudflare customer—including those on Free, Pro, and Business plans—to give everyone visibility into where AI is deployed across their Internet-facing apps.

We're also announcing an expanded collaboration with IBM, which has chosen Cloudflare to deliver AI security to its cloud customers. And we’re partnering with Wiz to give mutual customers a unified view of their AI security posture.

A new kind of attack surface

Traditional web applications have defined operations: check a bank balance, make a transfer. You can write deterministic rules to secure those interactions. 

AI-powered applications and agents are different. They accept natural language and generate unpredictable responses. There's no fixed set of operations to allow or deny, because the inputs and outputs are probabilistic. Attackers can manipulate large language models to take unauthorized actions or leak sensitive data. Prompt injection, sensitive information disclosure, and unbounded consumption are just a few of the risks cataloged in the OWASP Top 10 for LLM Applications.

These risks escalate as AI Continue reading

Dynamic Path MTU Discovery in Cloudflare One Client

Here’s an interesting tidbit from the what took them so long department: Cloudflare One Client continuously measures end-to-end MTU and adjusts the local tunnel interface MTU size accordingly (warning: there’s a fair amount of dubious handwaving over the interesting details), generating ICMP packet-too-big messages as close to the source as possible.

I managed to avoid VPN clients most of my life, so I have no idea whether this is a “finally someone figured that out 🎉” moment or a late catch-up to what other VPN clients have been doing for ages. Feedback (in comments or otherwise) would be most welcome!

Monitoring RoCEv2 with sFlow

The talk Seeing Through the RDMA Fog: Monitoring RoCEv2 with sFlow at the recent North American Network Operator's Group (NANOG) conference describes how leveraging industry standard sFlow telemetry from data center switches provides visibility into RDMA activity in AI / ML networks.

Note: Slides are available from the talk link.

The live SDSC Expanse cluster live AI/ML metrics dashboard described in the talk can be accesses by clicking on the dashboard link. The San Diego Supercomputer Center (SDSC) Expanse cluster specifications: 5 Pflop/s peak; 93,184 CPU cores; 208 NVIDIA GPUs; 220 TB total DRAM; 810 TB total NVMe.

Note: AI Metrics with Prometheus and Grafana shows how to set up the monitoring stack.

More recently, Expanse heatmap provides a publicly accessible real-time visualization live traffic flowing between nodes in the Expanse cluster, see Real-time visualization of AI / ML traffic matrix for more information.

How AI Agents Communicate: Understanding the A2A Protocol for Kubernetes

Since the rise of Large Language Models (LLMs) like GPT-3 and GPT-4, organizations have been rapidly adopting Agentic AI to automate and enhance their workflows. Agentic AI refers to AI systems that act autonomously, perceiving their environment, making decisions, and taking actions based on that information rather than just reacting to direct human input. In many ways, this makes AI agents similar to intelligent digital assistants, but they are capable of performing much more complex tasks over time without needing constant human oversight.

What is an AI Agent

An AI Agent is best thought of as a long-lived, thinking microservice that owns a set of perception, decision-making, and action capabilities rather than simply exposing a single API endpoint. These agents operate continuously, handling tasks over long periods rather than responding to one-time requests.

AI Agents in Kubernetes Environments

In Kubernetes environments, each agent typically runs as a pod or deployment and relies on the cluster network, DNS and possibly a service mesh to talk to tools and other agents.

Frameworks like Kagent help DevOps and platform engineers define and manage AI agents as first-class Kubernetes workloads. This means that instead of using custom, ad-hoc scripts to manually manage AI agents, Continue reading

Investigating multi-vector attacks in Log Explorer

In the world of cybersecurity, a single data point is rarely the whole story. Modern attackers don’t just knock on the front door; they probe your APIs, flood your network with "noise" to distract your team, and attempt to slide through applications and servers using stolen credentials.

To stop these multi-vector attacks, you need the full picture. By using Cloudflare Log Explorer to conduct security forensics, you get 360-degree visibility through the integration of 14 new datasets, covering the full surface of Cloudflare’s Application Services and Cloudflare One product portfolios. By correlating telemetry from application-layer HTTP requests, network-layer DDoS and Firewall logs, and Zero Trust Access events, security analysts can significantly reduce Mean Time to Detect (MTTD) and effectively unmask sophisticated, multi-layered attacks.

Read on to learn more about how Log Explorer gives security teams the ultimate landscape for rapid, deep-dive forensics.

The flight recorder for your entire stack

The contemporary digital landscape requires deep, correlated telemetry to defend against adversaries using multiple attack vectors. Raw logs serve as the "flight recorder" for an application, capturing every single interaction, attack attempt, and performance bottleneck. And because Cloudflare sits at the edge, between your users and your servers, all of these Continue reading

Building a security overview dashboard for actionable insights

For years, the industry’s answer to threats was “more visibility.” But more visibility without context is just more noise. For the modern security team, the biggest challenge is no longer a lack of data; it is the overwhelming surplus of it. Most security professionals start their day navigating a sea of dashboards, hunting through disparate logs to answer a single, deceptively simple question: "What now?"

When you are forced to pivot between different tools just to identify a single misconfiguration, you’re losing the window of opportunity to prevent an incident. That’s why we built a revamped Security Overview dashboard: a single interface designed to empower defenders, by moving from reactive monitoring to proactive control.

The new Security Overview dashboard.

From noise to action: rethinking the security overview 

Historically, dashboards focused on showing you everything that was happening. But for a busy security analyst, the more important question is, "What do I need to fix right now?"

To solve this, we are introducing Security Action Items. This feature acts as a functional bridge between detection and investigation, surfacing vulnerabilities, so you no longer have to hunt for them. To help you triage effectively, items are ranked by criticality:

netlab 26.03: EVPN/MPLS, IOS XR Features

netlab release 26.03 is out. Here are the highlights:

For even more details, check the release notes.

Translating risk insights into actionable protection: leveling up security posture with Cloudflare and Mastercard

Every new domain, application, website, or API endpoint increases an organization's attack surface. For many teams, the speed of innovation and deployment outpaces their ability to catalog and protect these assets, often resulting in a "target-rich, resource-poor" environment where unmanaged infrastructure becomes an easy entry point for attackers.

Replacing manual, point-in-time audits with automated security posture visibility is critical to growing your Internet presence safely. That’s why we are happy to announce a planned integration that will enable the continuous discovery, monitoring and remediation of Internet-facing blind spots directly in the Cloudflare dashboard: Mastercard’s RiskRecon attack surface intelligence capabilities.

Information Security practitioners in pay-as-you-go and Enterprise accounts will be able to preview the integration in the third quarter of 2026.

Attack surface intelligence can spot security gaps before attackers do

Mastercard’s RiskRecon attack surface intelligence identifies and prioritizes external vulnerabilities by mapping an organization's entire internet footprint using only publicly accessible data. As an outside-in scanner, the solution can be deployed instantly to uncover "shadow IT," forgotten subdomains, and unauthorized cloud servers that internal, credentialed scans often miss. By seeing what an attacker sees in real time, security teams can proactively close security gaps before they can be exploited.

But Continue reading

1 2 3 3,853