You can now add a Deploy to Cloudflare button to the README of your Git repository containing a Workers application — making it simple for other developers to quickly set up and deploy your project!
The Deploy to Cloudflare button:
Creates a new Git repository on your GitHub/ GitLab account: Cloudflare will automatically clone and create a new repository on your account, so you can continue developing.
Automatically provisions resources the app needs: If your repository requires Cloudflare primitives like a Workers KV namespace, a D1 database, or an R2 bucket, Cloudflare will automatically provision them on your account and bind them to your Worker upon deployment.
Configures Workers Builds (CI/CD): Every new push to your production branch on your newly created repository will automatically build and deploy courtesy of Workers Builds.
Adds preview URLs to each pull request: If you’d like to test your changes before deploying, you can push changes to a non-production branch and preview URLs will be generated and posted back to GitHub as a comment.
There is nothing more frustrating than struggling to kick the tires on a new project because you don’t know where to start. Continue reading
Did you know you could use the neighbor local-as BGP functionality to fake an iBGP session between different autonomous systems? I knew Cisco IOS supported that monstrosity for ages (supposedly “to merge two ISPs that have different AS numbers”) and added the appropriate tweaks1 into netlab when I added the BGP local-as support in release 1.3.1. Someone couldn’t resist pushing us down that slippery slope, and we ended with IBGP local-as implemented on 18 platforms (almost a dozen network operating systems).
I even wrote a related integration test, and all our implementations passed it until I asked myself a simple question: “But does it work?” and the number of correct implementations that passed the test without warnings dropped to zero.
Today we’re excited to announce AutoRAG in open beta, a fully managed Retrieval-Augmented Generation (RAG) pipeline powered by Cloudflare, designed to simplify how developers integrate context-aware AI into their applications. RAG is a method that improves the accuracy of AI responses by retrieving information from your own data, and providing it to the large language model (LLM) to generate more grounded responses.
Building a RAG pipeline is a patchwork of moving parts. You have to stitch together multiple tools and services — your data storage, a vector database, an embedding model, LLMs, and custom indexing, retrieval, and generation logic — all just to get started. Maintaining it is even harder. As your data changes, you have to manually reindex and regenerate embeddings to keep the system relevant and performant. What should be a simple “ask a question, get a smart answer” experience becomes a brittle pipeline of glue code, fragile integrations, and constant upkeep.
AutoRAG removes that complexity. With just a few clicks, it delivers a fully-managed RAG pipeline end-to-end: from ingesting your data and automatically chunking and embedding it, to storing vectors in Cloudflare’s Vectorize database, performing semantic retrieval, and generating high-quality responses using Workers AI. AutoRAG continuously monitors Continue reading
Betas are useful for feedback and iteration, but at the end of the day, not everyone is willing to be a guinea pig or can tolerate the occasional sharp edge that comes along with beta software. Sometimes you need that big, shiny “Generally Available” label (or blog post), and now it’s Workflows’ turn.
Workflows, our serverless durable execution engine that allows you to build long-running, multi-step applications (some call them “step functions”) on Workers, is now GA.
In short, that means it’s production ready — but it also doesn’t mean Workflows is going to ossify. We’re continuing to scale Workflows (including more concurrent instances), bring new capabilities (like the new waitForEvent
API), and make it easier to build AI agents with our Agents SDK and Workflows.
If you prefer code to prose, you can quickly install the Workflows starter project and start exploring the code and the API with a single command:
npm create cloudflare@latest workflows-starter --
--template="cloudflare/workflows-starter"
How does Workflows work? What can I build with it? How do I think about building AI agents with Workflows and the Agents SDK? Well, read on.
Workflows is a durable execution engine built on Cloudflare Workers that Continue reading
I’m thrilled to share that Cloudflare has acquired Outerbase. This is such an amazing opportunity for us, and I want to explain how we got here, what we’ve built so far, and why we are so excited about becoming part of the Cloudflare team.
Databases are key to building almost any production application: you need to persist state for your users (or agents), be able to query it from a number of different clients, and you want it to be fast. But databases aren’t always easy to use: designing a good schema, writing performant queries, creating indexes, and optimizing your access patterns tends to require a lot of experience. Add that to exposing your data through easy-to-grok APIs that make the ‘right’ way to do things obvious, a great developer experience (from dashboard to CLI), and well… there’s a lot of work involved.
The Outerbase team is already getting to work on some big changes to how databases (and your data) are viewed, edited, and visualized from within Workers, and we’re excited to give you a few sneak peeks into what we’ll be landing as we get to work.
When we first started Outerbase, we saw how Continue reading
It’s not a secret that at Cloudflare we are bullish on the future of agents. We’re excited about a future where AI can not only co-pilot alongside us, but where we can actually start to delegate entire tasks to AI.
While it hasn’t been too long since we first announced our Agents SDK to make it easier for developers to build agents, building towards an agentic future requires continuous delivery towards this goal. Today, we’re making several announcements to help accelerate agentic development, including:
New Agents SDK capabilities: Build remote MCP clients, with transport and authentication built-in, to allow AI agents to connect to external services.
BYO Auth provider for MCP: Integrations with Stytch, Auth0, and WorkOS to add authentication and authorization to your remote MCP server.
Hibernation for McpAgent: Automatically sleep stateful, remote MCP servers when inactive and wake them when needed. This allows you to maintain connections for long-running sessions while ensuring you’re not paying for idle time.
Durable Objects free tier: We view Durable Objects as a key component for building agents, and if you’re using our Agents SDK, you need access to it. Until today, Durable Objects Continue reading
The EVPN Fundamentals videos (part of the EVPN Technical Deep Dive webinar) are now public; you can watch them without an ipSpace.net account.
Want to spend more time watching free ipSpace.net videos? The complete list is here.
We’re kicking off Cloudflare’s 2025 Developer Week — our innovation week dedicated to announcements for developers.
It’s an exciting time to be a developer. In fact, as a developer, the past two years might have felt a bit like every week is Developer Week. Starting with the release of ChatGPT, it has felt like each day has brought a new, disruptive announcement, whether it’s new models, hardware, agents, or other tools. From late 2024 and in just the first few months of 2025, we’ve seen the DeepSeek model challenge assumptions about what it takes to train a new state-of-the-art model, MCP introduce a new standard for how LLMs interface with the world, and OpenAI’s o4 model Ghiblify the world.
And while it’s exciting to witness a technological revolution unfold in front of your eyes, it’s even more exciting to partake in it.
One of the marvels of the recent AI revolution is the extent to which the cost of experimentation has gone down. Ideas that would have taken whole weekends, weeks, or months to build can now be turned into working code in a day. You can vibe-code your way through things you might Continue reading
Figure 10-1 illustrates a simple distributed GPU cluster consisting of three GPU hosts. Each host has two GPUs and a Network Interface Card (NIC) with two interfaces. Intra-host GPU communication uses high-speed NVLink interfaces, while inter-host communication takes place via NICs over slower PCIe buses.
GPU-0 on each host is connected to Rail Switch A through interface E1. GPU-1 uses interface E2 and connects to Rail Switch B. In this setup, inter-host communication between GPUs connected to the same rail passes through a single switch. However, communication between GPUs on different rails goes over three hops Rail–Spine–Rail switches.
In Figure 10-1, we use a data parallelization strategy where a training dataset is split into six micro-batches, which are distributed across the GPUs. All GPUs use the shared feedforward neural network model and compute local model outputs. Next, each GPU calculates the model error and begins the backward pass to compute neuron-based gradients. These gradients indicate how much, and in which direction, the weight parameters should be adjusted to improve the training result (see Chapter 2 for details).
As one of Meta’s launch partners, we are excited to make Meta’s latest and most powerful model, Llama 4, available on the Cloudflare Workers AI platform starting today. Check out the Workers AI Developer Docs to begin using Llama 4 now.
Llama 4 is an industry-leading release that pushes forward the frontiers of open-source generative Artificial Intelligence (AI) models. Llama 4 relies on a novel design that combines a Mixture of Experts architecture with an early-fusion backbone that allows it to be natively multimodal.
The Llama 4 “herd” is made up of two models: Llama 4 Scout (109B total parameters, 17B active parameters) with 16 experts, and Llama 4 Maverick (400B total parameters, 17B active parameters) with 128 experts. The Llama Scout model is available on Workers AI today.
Llama 4 Scout has a context window of up to 10 million (10,000,000) tokens, which makes it one of the first open-source models to support a window of that size. A larger context window makes it possible to hold longer conversations, deliver more personalized responses, and support better Retrieval Augmented Generation (RAG). For example, users can take advantage of that increase to summarize multiple documents or Continue reading
Around the turn of the century we started to get a bigger need for high capacity web servers. For example there was the C10k problem paper.
At the time, the kinds of things done to reduce work done per request was pre-forking the web server. This means a request could be handled without an expensive process creation.
Because yes, creating a new process for every request was something perfectly normal.
Things did get better. People learned how to create threads, making things more
light weight. Then they switched to using poll()
/select()
, in order to not
just spare the process/thread creation, but the whole context switch.
I remember a comment on Kuro5hin from anakata, the creator of both The Pirate Bay and the webserver that powered it, along the lines of “I am select() of borg, resistance is futile”, mocking someone for not understanding how to write a scalable webserver.
But select()
/poll()
also doesn’t scale. If you have ten thousand
connections, that’s an array of ten thousand integers that need to be sent to
the kernel for every single iteration of your request handling loop.
Enter epoll
(kqueue
on other operating systems, but I’m focusing Continue reading
Hello my friend,
Within past blog posts we covered how to interact with network devices (in fact, many servers support that as well) using SSH/CLI and NETCONF/YANG. Those two protocols allow you to confidently cover almost all cases for managing devices, whether you prefer more human-like approach, that is templating CLI commands and pushing them via SSH or using structured XML data and send it via NETCONF. However, there are more protocols and today we are going to talk about the most modern to the date, which is called GNMI (generalized network management interface).
Talking to students at our trainings and with customers and peers at various events, there is often a concern arise that small and medium businesses don’t have time and/or need to invest in automation. There is no time as engineers are busy solving “real” problems, like outages, customer experience degradation, etc. There is no need because why to bother, we have engineers to tackle issue. From my experience automation allows to save so much time on doing operational tasks and to free it up for improving user experience and developing new problems, that is difficult to overestimate. It really is Continue reading
Infrahub provides multiple ways to interact with your infrastructure data, including the Web GUI, GraphQL queries, and the Python SDK. These can be used to query, modify, create, or delete data in Infrahub. In this post, we’ll focus on using the Python SDK to query data from Infrahub.
This post assumes you are familiar with basic Python and Infrahub. If you’re new to these topics, don’t worry, you can still follow along.
Throughout this post, we’ll be using the Always-On Infrahub demo instance, which is available for anyone to access via this link. The demo instance already has some data in it, so if you’d like to follow along or try this yourself, you can use it without needing to set up anything.
Originally published under - https://www.opsmill.com/querying-data-in-infrahub-via-the-python-sdk/
The Python SDK supports both synchronous and asynchronous Python. However, in this post, we’ll focus on using synchronous Python, which I hope most of us are comfortable with. We’ll cover async in a future blog post.
Interacting with Infrahub through the Python SDK is done using a client object, which defines the Infrahub instance you’ll be working with. This client acts as the connection point, allowing you to Continue reading
Out of band management networks were once more common than they are today. Should we go back to building out of band management networks? Should out of band management networks be virtual or physical? How can we sell out of band management networks to the folks paying the bills? Daryll Swer joins Tom Ammon and Russ White to discuss the importance of OOB management.
As cyber threats continue to exploit systemic vulnerabilities in widely used technologies, the United States Cybersecurity and Infrastructure Agency (CISA) produced best practices for the technology industry with their Secure-by-Design pledge. Cloudflare proudly signed this pledge on May 8, 2024, reinforcing our commitment to creating resilient systems where security is not just a feature, but a foundational principle.
We’re excited to share and provide transparency into how our security patching process meets one of CISA’s goals in the pledge: Demonstrating actions taken to increase installation of security patches for our customers.
Managing and deploying Linux kernel updates is one of Cloudflare’s most challenging security processes. In 2024, over 1000 CVEs were logged against the Linux kernel and patched. To keep our systems secure, it is vital to perform critical patch deployment across systems while maintaining the user experience.
A common technical support phrase is “Have you tried turning it off and then on again?”. One may be surprised how often this tactic is used — it is also an essential part of how Cloudflare operates at scale when it comes to applying our most critical patches. Frequently restarting systems exercises the Continue reading