Today the following network operating systems include support for the drop notification extension in their sFlow agent implementations:
Two additional sFlow dropped packet notification implementations are in the pipeline and should be available later this year:
Has your browsing experience ever been disrupted by this error page? Sometimes Cloudflare returns "Error 500" when our servers cannot respond to your web request. This inability to respond could have several potential causes, including problems caused by a bug in one of the services that make up Cloudflare's software stack.
We know that our testing platform will inevitably miss some software bugs, so we built guardrails to gradually and safely release new code before a feature reaches all users. Health Mediated Deployments (HMD) is Cloudflare’s data-driven solution to automating software updates across our global network. HMD works by querying Thanos, a system for storing and scaling Prometheus metrics. Prometheus collects detailed data about the performance of our services, and Thanos makes that data accessible across our distributed network. HMD uses these metrics to determine whether new code should continue to roll out, pause for further evaluation, or be automatically reverted to prevent widespread issues.
Cloudflare engineers configure signals from their service, such as alerting rules or Service Level Objectives (SLOs). For example, the following Service Level Indicator (SLI) checks the rate of HTTP 500 errors over 10 minutes returned from a service in our software stack.
sum(rate(http_request_count{code="500"}[10m])) Continue reading
When building a scalable, resilient GPU network fabric, the design of the rail layer, the portion of the topology that interconnects GPU servers via Top-of-Rack (ToR) switches, plays a critical role. This section explores three different models: Multi-rail-per-switch, Dual-rail-per-switch, and Single-rail-per-switch. All three support dual-NIC-per-GPU designs, allowing each GPU to connect redundantly to two separate switches, thereby removing the Rail switch as a single point of failure.
In this model, multiple small subnets and VLANs are configured per switch, with each logical rail mapped to a subset of physical interfaces. For example, a single 48-port switch might host four or eight logical rails using distinct Layer 2 and Layer 3 domains. Because all logical rails share the same physical device, isolation is logical. As a result, a hardware or software failure in the switch can impact all rails and their associated GPUs, creating a large failure domain.
This model is not part of NVIDIA’s validated Scalable Unit (SU) architecture but may suit test environments, development clusters, or small-scale GPU fabrics where hardware cost efficiency is a higher priority than strict fault isolation. From a CapEx perspective, multi-rail-per-switch is the most economical, requiring fewer switches.
Figure 13-10 illustrates the Continue reading
Hello my friend,
Today’s topic is critical to complete full picture of software development for network automation. Today’s topic is what allows you to conduct your tasks within meaningful time frame, especially when you have a lot of network devices, servers, virtual machines to manage. Today’s topic is concurrency of code execution in Python and Golang.
There are more than 100 programming languages out there. Some of them are quite universal and allow development of almost any kind of application. Others are more specific. Python is probably the most universal programming language, from what I’ve worked with or heard of. It can be used in infrastructure management at scale (e.g., OpenStack is written in Python), web applications, data science and many more. Golang is much more low-level compared to Python and, therefore, way more performant. Various benchmarks available online suggests that the same business tasks could be 3-30 times quicker in Golang compared to Python; therefore Golang is suitable for system programing (e.g, Kubernetes and Docker are created in Go). That’s what we cover in our blogs. Apart from them there are a lot of other: C/C++ if you Continue reading
One of the “great fears” advancing AI unlocks is that most of our jobs can, and will, be replaced by various forms of AI. Join us on this episode of the Hedge as Jonathan Mast at White Beard Strategies, Tom Ammon, and Russ White discuss whether we are likely to see a net loss, gain, or wash in jobs as companies deploy LLMS, and other potential up- and down-sides.
download
You can now connect to Cloudflare's first publicly available remote Model Context Protocol (MCP) servers from Claude.ai (now supporting remote MCP connections!) and other MCP clients like Cursor, Windsurf, or our own AI Playground. Unlock Cloudflare tools, resources, and real time information through our new suite of MCP servers including:
Server | Description |
---|---|
Cloudflare Documentation server | Get up to date reference information from Cloudflare Developer Documentation |
Workers Bindings server | Build Workers applications with storage, AI, and compute primitives |
Workers Observability server | Debug and get insight into your Workers application’s logs and analytics |
Container server | Spin up a sandbox development environment |
Browser rendering server | Fetch web pages, convert them to markdown and take screenshots |
Radar server | Get global Internet traffic insights, trends, URL scans, and other utilities |
Logpush server | Get quick summaries for Logpush job health |
AI Gateway server | Search your logs, get details about the prompts and responses |
AutoRAG server | List and search documents on your AutoRAGs |
Audit Logs server | Query audit logs and generate reports for review |
DNS Analytics server | Optimize DNS performance and debug issues based on current set up |
Digital Experience Monitoring server | Get quick insight on critical applications for your organization |
Cloudflare One CASB Continue reading |
Today, we're excited to collaborate with Anthropic, Asana, Atlassian, Block, Intercom, Linear, PayPal, Sentry, Stripe, and Webflow to bring a whole new set of remote MCP servers, all built on Cloudflare, to enable Claude users to manage projects, generate invoices, query databases, and even deploy full stack applications — without ever leaving the chat interface.
Since Anthropic’s introduction of the Model Context Protocol (MCP) in November, there’s been more and more excitement about it, and it seems like a new MCP server is being released nearly every day. And for good reason! MCP has been the missing piece to make AI agents a reality, and helped define how AI agents interact with tools to take actions and get additional context.
But to date, end-users have had to install MCP servers on their local machine to use them. Today, with Anthropic’s announcement of Integrations, you can access an MCP server the same way you would a website: type a URL and go.
At Cloudflare, we’ve been focused on building out the tooling that simplifies the development of remote MCP servers, so that our customers’ engineering teams can focus their time on building out the MCP tools for their Continue reading
We’re continuing to make it easier for developers to bring their services into the AI ecosystem with the Model Context Protocol (MCP). Today, we’re announcing two new capabilities:
Streamable HTTP Transport: The Agents SDK now supports the new Streamable HTTP transport, allowing you to future-proof your MCP server. Our implementation allows your MCP server to simultaneously handle both the new Streamable HTTP transport and the existing SSE transport, maintaining backward compatibility with all remote MCP clients.
Deploy MCP servers written in Python: In 2024, we introduced first-class Python language support in Cloudflare Workers, and now you can build MCP servers on Cloudflare that are entirely written in Python.
Click “Deploy to Cloudflare” to get started with a remote MCP server that supports the new Streamable HTTP transport method, with backwards compatibility with the SSE transport.
The MCP spec was updated on March 26 to introduce a new transport mechanism for remote MCP, called Streamable HTTP. The new transport simplifies how AI agents can interact with services by using a single HTTP endpoint for sending and receiving responses between the client and the Continue reading
A networking-focused entity known only as humblegrumble sent me the following question after reading my When OSPF Becomes a Distance Vector Protocol article:
How do A1 and A2 know not to advertise a Type-3 summary LSA generated from area 1 prefixes back into area 1?
He’s right. There is no “originating area” information in the type-3 LSA, so how does an ABR know not to reinsert the type-3 LSA generated by another ABR back into the area?
TL&DR: The OSPF route selection process takes care of that.