IBM And AMD Tag Team On Hybrid Classical-Quantum Supercomputers

As we talked about a decade ago in the wake of launching The Next Platform, quantum computers – at least the fault tolerant ones being built by IBM, Google, Rigetti, and a few others – need a massive amount of traditional Von Neumann compute to help maintain their state, assist with qubit error correction, and assist with their computations.

IBM And AMD Tag Team On Hybrid Classical-Quantum Supercomputers was written by Timothy Prickett Morgan at The Next Platform.

AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint

Getting the observability you need is challenging enough when the code is deterministic, but AI presents a new challenge — a core part of your user’s experience now relies on a non-deterministic engine that provides unpredictable outputs. On top of that, there are many factors that can influence the results: the model, the system prompt. And on top of that, you still have to worry about performance, reliability, and costs. 

Solving performance, reliability and observability challenges is exactly what Cloudflare was built for, and two years ago, with the introduction of AI Gateway, we wanted to extend to our users the same levels of control in the age of AI. 

Today, we’re excited to announce several features to make building AI applications easier and more manageable: unified billing, secure key storage, dynamic routing, security controls with Data Loss Prevention (DLP). This means that AI Gateway becomes your go-to place to control costs and API keys, route between different models and providers, and manage your AI traffic. Check out our new AI Gateway landing page for more information at a glance.

Connect to all your favorite AI providers

When using an AI provider, you typically have to sign up for Continue reading

State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI

When we first launched Workers AI, we made a bet that AI models would get faster and smaller. We built our infrastructure around this hypothesis, adding specialized GPUs to our datacenters around the world that can serve inference to users as fast as possible. We created our platform to be as general as possible, but we also identified niche use cases that fit our infrastructure well, such as low-latency image generation or real-time audio voice agents. To lean in on those use cases, we’re bringing on some new models that will help make it easier to develop for these applications.

Today, we’re excited to announce that we are expanding our model catalog to include closed-source partner models that fit this use case. We’ve partnered with Leonardo.Ai and Deepgram to bring their latest and greatest models to Workers AI, hosted on Cloudflare’s infrastructure. Leonardo and Deepgram both have models with a great speed-to-performance ratio that suit the infrastructure of Workers AI. We’re starting off with these great partners — but expect to expand our catalog to other partner models as well.

The benefits of using these models on Workers AI is that we don’t only have a standalone inference Continue reading

How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive

As the demand for AI products grows, developers are creating and tuning a wider variety of models. While adding new models to our growing catalog on Workers AI, we noticed that not all of them are used equally – leaving infrequently used models occupying valuable GPU space. Efficiency is a core value at Cloudflare, and with GPUs being the scarce commodity they are, we realized that we needed to build something to fully maximize our GPU usage.

Omni is an internal platform we’ve built for running and managing AI models on Cloudflare’s edge nodes. It does so by spawning and managing multiple models on a single machine and GPU using lightweight isolation. Omni makes it easy and efficient to run many small and/or low-volume models, combining multiple capabilities by:  

  • Spawning multiple models from a single control plane,

  • Implementing lightweight process isolation, allowing models to spin up and down quickly,

  • Isolating the file system between models to easily manage per-model dependencies, and

  • Over-committing GPU memory to run more models on a single GPU.

Cloudflare aims to place GPUs as close as we possibly can to people and applications that are using them. With Omni in place, we’re now able to run Continue reading

How we built the most efficient inference engine for Cloudflare’s network

Inference powers some of today’s most powerful AI products: chat bot replies, AI agents, autonomous vehicle decisions, and fraud detection. The problem is, if you’re building one of these products on top of a hyperscaler, you’ll likely need to rent expensive GPUs from large centralized data centers to run your inference tasks. That model doesn’t work for Cloudflare — there’s a mismatch between Cloudflare’s globally-distributed network and a typical centralized AI deployment using large multi-GPU nodes. As a company that operates our own compute on a lean, fast, and widely distributed network within 50ms of 95% of the world’s Internet-connected population, we need to be running inference tasks more efficiently than anywhere else.

This is further compounded by the fact that AI models are getting larger and more complex. As we started to support these models, like the Llama 4 herd and gpt-oss, we realized that we couldn’t just throw money at the scaling problems by buying more GPUs. We needed to utilize every bit of idle capacity and be agile with where each model is deployed. 

After running most of our models on the widely used open source inference and serving engine vLLM, we figured out it Continue reading

SwiNOG 40: Application-Based Source Routing with SRv6

The we should give different applications different paths across the network idea never dies (even though in many places the residential Internet gives you enough bandwidth to watch 4K videos), and the Leveraging Intent-Based Networking and SRv6 for Dynamic End-to-End Traffic Steering (video) by Severin Dellsperger was an interesting new riff on that ancient grailhunt.

Their solution uses SRv6 for traffic steering1, an Intent-Based System2 that figures out paths across the network, and eBPF on client hosts3 to add per-application SRv6 headers to outgoing traffic.

Securing the AI Revolution: Introducing Cloudflare MCP Server Portals

Securing the AI Revolution: Introducing Cloudflare MCP Server Portals

Large Language Models (LLMs) are rapidly evolving from impressive information retrieval tools into active, intelligent agents. The key to unlocking this transformation is the Model Context Protocol (MCP), an open-source standard that allows LLMs to securely connect to and interact with any application — from Slack to Canva, to your own internal databases.

This is a massive leap forward. With MCP, an LLM client like Gemini, Claude, or ChatGPT can answer more than just "tell me about Slack." You can ask it: "What were the most critical engineering P0s in Jira from last week, and what is the current sentiment in the #engineering-support Slack channel regarding them? Then propose updates and bug fixes to merge."

This is the power of MCP: turning models into teammates.

But this great power comes with proportional risk. Connecting LLMs to your most critical applications creates a new, complex, and largely unprotected attack surface. Today, we change that. We’re excited to announce Cloudflare MCP Server Portals are now available in Open Beta. MCP Server Portals are a new capability that enable you to centralize, secure, and observe every MCP connection in your organization. Continue reading

Introducing Cloudflare Application Confidence Score For AI Applications

Introduction

The availability of SaaS and Gen AI applications is transforming how businesses operate, boosting collaboration and productivity across teams. However, with increased productivity comes increased risk, as employees turn to unapproved SaaS and Gen AI applications, often dumping sensitive data into them for quick productivity wins. 

The prevalence of “Shadow IT” and “Shadow AI” creates multiple problems for security, IT, GRC and legal teams. For example:

In spite of these problems, blanket bans of Gen AI don't work. They stifle innovation and push employee usage underground. Instead, organizations need smarter controls.

Security, IT, legal and GRC teams therefore face a difficult challenge: how can you appropriately assess each third-party application, without Continue reading

ChatGPT, Claude, & Gemini security scanning with Cloudflare CASB

Starting today, all users of Cloudflare One, our secure access service edge (SASE) platform, can use our API-based Cloud Access Security Broker (CASB) to assess the security posture of their generative AI (GenAI) tools: specifically, OpenAI’s ChatGPT, Claude by Anthropic, and Google’s Gemini. Organizations can connect their GenAI accounts and within minutes, start detecting misconfigurations, Data Loss Prevention (DLP) matches, data exposure and sharing, compliance risks, and more — all without having to install cumbersome software onto user devices.

As Generative AI adoption has exploded in the enterprise, IT and Security teams need to hustle to keep themselves abreast of newly emerging  security and compliance challenges that come alongside these powerful tools. In this rapidly changing landscape, IT and Security teams need tools that help enable AI adoption while still protecting the security and privacy of their enterprise networks and data. 

Cloudflare’s API CASB and inline CASB work together to help organizations safely adopt AI tools. The API CASB integrations provide out-of-band visibility into data at rest and security posture inside popular AI tools like ChatGPT, Claude, and Gemini. At the same time, Cloudflare Gateway provides in-line prompt controls and Shadow AI identification. It applies policies and Continue reading

Block unsafe prompts targeting your LLM endpoints with Firewall for AI

Security teams are racing to secure a new attack surface: AI-powered applications. From chatbots to search assistants, LLMs are already shaping customer experience, but they also open the door to new risks. A single malicious prompt can exfiltrate sensitive data, poison a model, or inject toxic content into customer-facing interactions, undermining user trust. Without guardrails, even the best-trained model can be turned against the business.

Today, as part of AI Week, we’re expanding our AI security offerings by introducing unsafe content moderation, now integrated directly into Cloudflare Firewall for AI. Built with Llama, this new feature allows customers to leverage their existing Firewall for AI engine for unified detection, analytics, and topic enforcement, providing real-time protection for Large Language Models (LLMs) at the network level. Now with just a few clicks, security and application teams can detect and block harmful prompts or topics at the edge — eliminating the need to modify application code or infrastructure. This feature is immediately available to current Firewall for AI users. Those not yet onboarded can contact their account team to participate in the beta program.

AI protection in application security

Cloudflare's Firewall for AI protects user-facing LLM applications from abuse and Continue reading

Best Practices for Securing Generative AI with SASE

As Generative AI revolutionizes businesses everywhere, security and IT leaders find themselves in a tough spot. Executives are mandating speedy adoption of Generative AI tools to drive efficiency and stay abreast of competitors. Meanwhile, IT and Security teams must rapidly develop an AI Security Strategy, even before the organization really understands exactly how it plans to adopt and deploy Generative AI. 

IT and Security teams are no strangers to “building the airplane while it is in flight”. But this moment comes with new and complex security challenges. There is an explosion in new AI capabilities adopted by employees across all business functions — both sanctioned and unsanctioned. AI Agents are ingesting authentication credentials and autonomously interacting with sensitive corporate resources. Sensitive data is being shared with AI tools, even as security and compliance frameworks struggle to keep up.

While it demands strategic thinking from Security and IT leaders, the problem of governing the use of AI internally is far from insurmountable. SASE (Secure Access Service Edge) is a popular cloud-based network architecture that combines networking and security functions into a single, integrated service that provides employees with secure and efficient access to the Internet and to corporate resources, regardless Continue reading

What’s New in Calico – Summer 2025

As Kubernetes adoption scales across enterprise architectures, platform architects face mounting pressure to implement consistent security guardrails across distributed, multi-cluster environments while maintaining operational velocity. Modern infrastructure demands a security architecture that can adapt without introducing complexity or performance penalties. Traditional approaches force architects to cobble together separate solutions for ingress protection, network policies, and application-layer security, creating operational friction and increasing attack surface.

Today, we’re announcing significant enhancements to Calico that eliminate this architectural complexity. This release introduces native Web Application Firewall (WAF) capabilities integrated directly into Calico’s Ingress Gateway, enabling platform architects to deploy a single technology stack for both ingress management and HTTP-layer threat protection. Combined with enhanced Role-Based Access Controls (RBAC) controls, and centralized observability across heterogeneous workloads, platform architects can now design and implement comprehensive security all within a unified platform.

The new features in this release can be grouped under two main categories:

  1. Security at Scale with a Unified Platform: This release introduces critical security features that make it easier to secure and scale Kubernetes workloads.
  2. Simplified Operations for Kubernetes, VM, and bare metal workloads: Reducing complexity is key to scaling Kubernetes, VM, and bare metal workloads, and this release introduces features that make Continue reading

NeuReality Wants Its NR2 To Be Your Arm CPU For AI

Just because the center of gravity for GenAI compute and other kinds of machine learning and data analytics has shifted from the CPU to the XPU accelerator – generally a GPU these days, but not universally – does not mean that the choice of the CPU for the system hosting those XPUs doesn’t matter.

NeuReality Wants Its NR2 To Be Your Arm CPU For AI was written by Timothy Prickett Morgan at The Next Platform.

Unmasking the Unseen: Your Guide to Taming Shadow AI with Cloudflare One

The digital landscape of corporate environments has always been a battleground between efficiency and security. For years, this played out in the form of "Shadow IT" — employees using unsanctioned laptops or cloud services to get their jobs done faster. Security teams became masters at hunting these rogue systems, setting up firewalls and policies to bring order to the chaos.

But the new frontier is different, and arguably far more subtle and dangerous.

Imagine a team of engineers, deep into the development of a groundbreaking new product. They're on a tight deadline, and a junior engineer, trying to optimize his workflow, pastes a snippet of a proprietary algorithm into a popular public AI chatbot, asking it to refactor the code for better performance. The tool quickly returns the revised code, and the engineer, pleased with the result, checks it in. What they don't realize is that their query, and the snippet of code, is now part of the AI service’s training data, or perhaps logged and stored by the provider. Without anyone noticing, a critical piece of the company's intellectual property has just been sent outside the organization's control, a silent and unmonitored data leak.

This isn't a Continue reading

Cloudflare Launching AI Miniseries for Developers (and Everyone Else They Know)

If you’re here on the Cloudflare blog, chances are you already understand AI pretty well. But step outside our circle, and you’ll find a surprising number of people who still don’t know what it really is — or why it matters.

We wanted to come up with a way to make AI intuitive, something you can actually see and touch to get what’s going on. Hands on, not just hand-wavy.

The idea we landed on is simple: nothing comes into the world fully formed. Like us, and like the Internet, AI didn’t show up fully formed. So we asked ourselves: what if we told the story of AI as it learns and grows?

Episode by episode, we’d give it new capabilities, explain how those capabilities work, and explore how they change the way AI interacts with the world. Giving it a voice. Letting it see. Helping it learn. And maybe even letting it imagine the future.

So we made AI Avenue, a show where I (Craig) explore the fun, human, and sometimes surprising sides of AI… with a little help from my co-host Yorick, a robot hand with a knack for comic timing and the occasional eye-roll. Together, we travel, Continue reading

Beyond the ban: A better way to secure generative AI applications

The revolution is already inside your organization, and it's happening at the speed of a keystroke. Every day, employees turn to generative artificial intelligence (GenAI) for help with everything from drafting emails to debugging code. And while using GenAI boosts productivity—a win for the organization—this also creates a significant data security risk: employees may potentially share sensitive information with a third party.

Regardless of this risk, the data is clear: employees already treat these AI tools like a trusted colleague. In fact, one study found that nearly half of all employees surveyed admitted to entering confidential company information into publicly available GenAI tools. Unfortunately, the risk for human error doesn’t stop there. Earlier this year, a new feature in a leading LLM meant to make conversations shareable had a serious unintended consequence: it led to thousands of private chats — including work-related ones — being indexed by Google and other search engines. In both cases, neither example was done with malice. Instead, they were miscalculations on how these tools would be used, and it certainly did not help that organizations did not have the right tools to protect their data. 

While the instinct for many may be to deploy Continue reading

When Switches Flood LLDP Traffic

A networking engineer (let’s call him Joe1) sent me an interesting challenge: they built a data center network with Cisco switches, and the switches flood LLDP packets between servers.

That would be interesting by itself (the whole network would appear as a single hub), but they’re also using DCBX (which is riding in LLDP TLVs), and the DCBX parameters are negotiated between servers (not between servers and adjacent switches), sometimes resulting in NIC resets2.

AI Browsers Are Here — My Experience with Perplexity’s Comet

I have been using Perplexity’s Comet browser for the past two weeks, and it has completely changed the way I use browsers 🌐. I’ve been a Chrome user for as long as I can remember, but after trying out Comet for two weeks, I finally made it my default browser ✅. Comet functions not just … Continue reading AI Browsers Are Here — My Experience with Perplexity’s Comet