Ming Lu, Author at NetworkingNexus.net

Ming Lu

Author Archives: Ming Lu

Your AI bill is out of control. Cloudflare can fix it now.

There isn't a CIO on the planet not worried about AI spend right now. CFOs are increasingly nervous, too.

For fear of falling behind, many companies have pushed their employees to use AI as aggressively as possible. The edict was clear: "Move fast, we'll figure out the bill later." And for the most part, it worked: AI has been genuinely transformational for the teams that leaned in.

But the costs are real: we’ve heard countless horror stories of huge bills and painful overages on token spend.

Today, we're announcing spend controls in Cloudflare AI Gateway, and a closed beta for identity-driven budgets and routing using Cloudflare Access and your existing identity provider.

As we’ve spoken with hundreds of companies about their AI strategy, we’ve seen a common story: The company gives every engineer access to frontier models through a shared API key. Usage takes off. At the end of the month, finance pulls the invoice and nobody can explain where the money went. Was it the machine learning team training a new pipeline? Was it an intern running Claude Opus on email triage? Was it a runaway continuous integration job that burned through 50 million tokens in a weekend? Continue reading

Building the agentic cloud: everything we launched during Agents Week 2026

Today marks the end of our first Agents Week, an innovation week dedicated entirely to the age of agents. It couldn’t have been more timely: over the past year, agents have swiftly changed how people work. Coding agents are helping developers ship faster than ever. Support agents resolve tickets end-to-end. Research agents validate hypotheses across hundreds of sources in minutes. And people aren't just running one agent: they're running several in parallel and around the clock.

As Cloudflare's CTO Dane Knecht and VP of Product Rita Kozlov noted in our welcome to Agents Week post, the potential scale of agents is staggering: If even a fraction of the world's knowledge workers each run a few agents in parallel, you need compute capacity for tens of millions of simultaneous sessions. The one-app-serves-many-users model the cloud was built on doesn't work for that. But that's exactly what developers and businesses want to do: build agents, deploy them to users, and run them at scale.

Getting there means solving problems across the entire stack. Agents need compute that scales from full operating systems to lightweight isolates. They need security and identity built into how they run. They need an agent toolbox Continue reading

Cloudflare’s AI Platform: an inference layer designed for agents

AI models are changing quickly: the best model to use for agentic coding today might in three months be a completely different model from a different provider. On top of this, real-world use cases often require calling more than one model. Your customer support agent might use a fast, cheap model to classify a user's message; a large, reasoning model to plan its actions; and a lightweight model to execute individual tasks.

This means you need access to all the models, without tying yourself financially and operationally to a single provider. You also need the right systems in place to monitor costs across providers, ensure reliability when one of them has an outage, and manage latency no matter where your users are.

These challenges are present whenever you’re building with AI, but they get even more pressing when you’re building agents. A simple chatbot might make one inference call per user prompt. An agent might chain ten calls together to complete a single task and suddenly, a single slow provider doesn't add 50ms, it adds 500ms. One failed request isn't a retry, but suddenly a cascade of downstream failures.

Since launching AI Gateway and Workers AI, we’ve seen incredible Continue reading