NetworkingNexus.net

Workers AI gets a speed boost, batch workload support, more LoRAs, new models, and a refreshed dashboard

Since the launch of Workers AI in September 2023, our mission has been to make inference accessible to everyone.

Over the last few quarters, our Workers AI team has been heads down on improving the quality of our platform, working on various routing improvements, GPU optimizations, and capacity management improvements. Managing a distributed inference platform is not a simple task, but distributed systems are also what we do best. You’ll notice a recurring theme from all these announcements that has always been part of the core Cloudflare ethos — we try to solve problems through clever engineering so that we are able to do more with less.

Today, we’re excited to introduce speculative decoding to bring you faster inference, an asynchronous batch API for large workloads, and expanded LoRA support for more customized responses. Lastly, we’ll be recapping some of our newly added models, updated pricing, and unveiling a new dashboard to round out the usability of the platform.

Speeding up inference by 2-4x with speculative decoding and more

We’re excited to roll out speed improvements to models in our catalog, starting with the Llama 3.3 70b model. These improvements include speculative decoding, prefix caching, an updated inference backend, Continue reading

Google Woos HPC Centers With Fast CPUs And Networks

The HPC centers of the world like fast networks and compute, but they are also always working under budget constraints unlike their AI peers out there in the enterprise, where money seems to be unlimited to what sometimes looks like an irrationally exuberant extent. …

Google Woos HPC Centers With Fast CPUs And Networks was written by Timothy Prickett Morgan at The Next Platform.

TL011: Getting the Entire Team to Speak

On today’s episode we’re joined by Daniel Ward to get a sneak preview of his talk on Getting the Entire Team to Speak, which he’ll give at DevOpsDay Austin. His addresses the challenges of getting people to speak up. Why is this needed? Getting input from everyone on a team lets people raise issues and... Read more »

Dropped packet notifications with Cisco 8000 Series Routers

The availability of the Cisco IOS XR Release 25.1.1 brings sFlow dropped packet notification support to Cisco 8000 series routers, making it easy to capture and analyze packets dropped at router ingress, aiding in understanding blocked traffic types, identifying potential security threats, and optimizing network performance.

sFlow Configuration for Traffic Monitoring and Analysis describes the steps to enable sFlow and configure packet sampling and interface counter export from a Cisco 8000 Series router to a remote sFlow analyzer.

Note: Devices using NetFlow or IPFIX must transition to sFlow for regular sampling before utilizing the dropped packet feature, ensuring compatibility and consistency in data analysis.

Router(config)#monitor-session monitor1
Router(config)#destination sflow EXP-MAP
Router(config)#forward-drops rx

Configure a monitor-session with the new destination sflow option to export dropped packet notifications (which include ingress interface, drop reason, and header of dropped packet) to the configured sFlow analyzer.

Cisco lists the following benefits of streaming dropped packets in the configuration guide:

Enhanced Network Visibility: Captures and forwards dropped packets to an sFlow collector, providing detailed insights into packet loss and improving diagnostic capabilities.
Comprehensive Analysis: Allows for simultaneous analysis of regular and dropped packet flows, offering a holistic view of network performance.
Troubleshooting: Empowers Continue reading

Making Super Slurper 5x faster with Workers, Durable Objects, and Queues

Super Slurper is Cloudflare’s data migration tool that is designed to make large-scale data transfers between cloud object storage providers and Cloudflare R2 easy. Since its launch, thousands of developers have used Super Slurper to move petabytes of data from AWS S3, Google Cloud Storage, and other S3-compatible services to R2.

But we saw an opportunity to make it even faster. We rearchitected Super Slurper from the ground up using our Developer Platform — building on Cloudflare Workers, Durable Objects, and Queues — and improved transfer speeds by up to 5x. In this post, we’ll dive into the original architecture, the performance bottlenecks we identified, how we solved them, and the real-world impact of these improvements.

Initial architecture and performance bottlenecks

Super Slurper originally shared its architecture with SourcingKit, a tool built to bulk import images from AWS S3 into Cloudflare Images. SourcingKit was deployed on Kubernetes and ran alongside the Images service. When we started building Super Slurper, we split it into its own Kubernetes namespace and introduced a few new APIs to make it easier to use for the object storage use case. This setup worked well and helped thousands of developers move data to Continue reading

Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines

Today, we’re launching the open beta of Pipelines, our streaming ingestion product. Pipelines allows you to ingest high volumes of structured, real-time data, and load it into our object storage service, R2. You don’t have to manage any of the underlying infrastructure, worry about scaling shards or metadata services, and you pay for the data processed (and not by the hour). Anyone on a Workers paid plan can start using it to ingest and batch data — at tens of thousands of requests per second (RPS) — directly into R2.

But this is just the tip of the iceberg: you often want to transform the data you’re ingesting, hydrate it on-the-fly from other sources, and write it to an open table format (such as Apache Iceberg), so that you can efficiently query that data once you’ve landed it in object storage.

The good news is that we’ve thought about that too, and we’re excited to announce that we’ve acquired Arroyo, a cloud-native, distributed stream processing engine, to make that happen.

With Arroyo and our just announced R2 Data Catalog, we’re getting increasingly serious about building a data platform that allows you to ingest data across the planet, store Continue reading

R2 Data Catalog: Managed Apache Iceberg tables with zero egress fees

Apache Iceberg is quickly becoming the standard table format for querying large analytic datasets in object storage. We’re seeing this trend firsthand as more and more developers and data teams adopt Iceberg on Cloudflare R2. But until now, using Iceberg with R2 meant managing additional infrastructure or relying on external data catalogs.

So we’re fixing this. Today, we’re launching the R2 Data Catalog in open beta, a managed Apache Iceberg catalog built directly into your Cloudflare R2 bucket.

If you’re not already familiar with it, Iceberg is an open table format built for large-scale analytics on datasets stored in object storage. With R2 Data Catalog, you get the database-like capabilities Iceberg is known for – ACID transactions, schema evolution, and efficient querying – without the overhead of managing your own external catalog.

R2 Data Catalog exposes a standard Iceberg REST catalog interface, so you can connect the engines you already use, like PyIceberg, Snowflake, and Spark. And, as always with R2, there are no egress fees, meaning that no matter which cloud or region your data is consumed from, you won’t have to worry about growing data transfer costs.

Ready to query data in R2 right now? Jump Continue reading

Sequential consistency without borders: how D1 implements global read replication

Read replication of D1 databases is in public beta!

D1 read replication makes read-only copies of your database available in multiple regions across Cloudflare’s network. For busy, read-heavy applications like e-commerce websites, content management tools, and mobile apps:

D1 read replication lowers average latency by routing user requests to read replicas in nearby regions.
D1 read replication increases overall throughput by offloading read queries to read replicas, allowing the primary database to handle more write queries.

The main copy of your database is called the primary database and the read-only copies are called read replicas. When you enable replication for a D1 database, the D1 service automatically creates and maintains read replicas of your primary database. As your users make requests, D1 routes those requests to an appropriate copy of the database (either the primary or a replica) based on performance heuristics, the type of queries made in those requests, and the query consistency needs as expressed by your application.

All of this global replica creation and request routing is handled by Cloudflare at no additional cost.

To take advantage of read replication, your Worker needs to use the new D1 Sessions API. Click the button below Continue reading

N4N021: Is It the Network? Network Monitoring Basics

How do you know what is happening within your network, especially when something isn’t working? Network monitoring is the answer. On today’s show, we’ll start with the basics of network monitoring. We’ll cover what it is, how it’s used, and suggest some paid and open source network monitoring tools. This week’s bonus material is a... Read more »

Response: NAT Traversal Mess

Let’s look at another part of the lengthy comment Bob left after listening to the Rise of NAT podcast. This one is focused on the NAT traversal mess:

You mentioned that only video-conferencing and BitTorrent use client-to-client connectivity (and they are indeed the main use cases), but hell, do they need to engineer complex systems to circumvent these NATs and firewalls: STUN, TURN, ICE, DHT…

Cleaning up the acronym list first: DHT is unlike the others and has nothing to do with NAT.

With “Ironwood” TPU, Google Pushes The AI Accelerator To The Floor

If you want to be a leading in supplying AI models and AI applications, as well as AI infrastructure to run it, to the world, it is also helpful to have a business that needs a lot of AI that can underwrite the development of homegrown infrastructure that can be sold side-by-side with the standard in the industry. …

With “Ironwood” TPU, Google Pushes The AI Accelerator To The Floor was written by Timothy Prickett Morgan at The Next Platform.

Calico Open Source 3.30: Exploring the Goldmane API for custom Kubernetes Network Observability

Kubernetes is built on the foundation of APIs and abstraction, and Calico leverages its extensibility to deliver network security and observability in both its commercial and open source versions. APIs are the special sauce that help automate and operationalize your Kubernetes platforms as part of a CI/CD pipeline and other GitOps workflows.

Calico OSS 3.30, introduces numerous battle-tested observability and security tools from our commercial editions. This includes the following key features:

Goldmane – A gRPC-based API for accessing and capturing flow logs and policy evaluation metrics
Whisker – A web-based tool for viewing and filtering flow logs to troubleshoot connectivity issues and author and maintain Calico network security policies
GlobalStagedNetworkPolicy and StagedNetworkPolicy – New custom resources that allow you to audit the behavior of a new policy before you actively enforce it
Calico Ingress Gateway – Our 100% upstream, enterprise-ready implementation of the Gateway API that is based on Envoy Gateway
Calico Cloud ready – Every OSS cluster includes the required components to connect to a stateless, read-only, and free version of Calico Cloud

You may know about the Calico REST API, which allows you to manage Calico resources, such as Calico network policy, Calico IPAM configurations Continue reading

NAN089: A Career Journey of Exponential Learning

“If you don’t feel nervous in front of a challenge, you are not exponentially learning” is how today’s guest Christian Adell describes his own approach to career growth. Christian chats with us first about how he got started in IT, his various experiences in both networking and DevOps and then network automation. He leads a... Read more »

Make your apps truly interactive with Cloudflare Realtime and RealtimeKit

Over the past few years, we’ve seen developers push the boundaries of what’s possible with real-time communication — tools for collaborative work, massive online watch parties, and interactive live classrooms are all exploding in popularity.

We use AI more and more in our daily lives. Text-based interactions are evolving into something more natural: voice and video. When users interact with the applications and tools that AI developers create, we have high expectations for response time and connection quality. Complex applications of AI are built on not just one tool, but a combination of tools, often from different providers which requires a well connected cloud to sit in the middle for the coordination of different AI tools.

Developers already use Workers, Workers AI, and our WebRTC SFU and TURN services to build powerful apps without needing to think about coordinating compute or media services to be closest to their user. It’s only natural for there to be a singular "Region: Earth" for real-time applications.

We're excited to introduce Cloudflare Realtime — a suite of products to help you make your apps truly interactive with real-time audio and video experiences. Cloudflare Realtime now brings together our SFU, STUN, and TURN Continue reading

Introducing Cloudflare Secrets Store (Beta): secure your secrets, simplify your workflow

Every cloud platform needs a secure way to store API tokens, keys, and credentials — welcome, Cloudflare Secrets Store! Today, we are very excited to announce and launch Secrets Store in beta. We built Cloudflare Secrets Store to help our customers centralize management, improve security, and restrict access to sensitive values on the Cloudflare platform.

Wherever secrets exist at Cloudflare – from our developer platform, to AI products, to Cloudflare One – we’ve built a centralized platform that allows you to manage them in one place.

We are excited to integrate Cloudflare Secrets Store with the whole portfolio of Cloudflare products, starting today with Cloudflare Workers.

Securing your secrets across Workers

If you have a secret you want to use across multiple Workers, you can now use the Cloudflare Secrets Store to do so. You can spin up your store from the dashboard or by using Wrangler CLI:

wrangler secrets-store store create <name>

Then, create a secret:

wrangler secrets-store secret create <store-id>

Once the secret is created, you can specify the binding to deploy in a Worker immediately.

secrets_store_secrets = [
{ binding = "'open_AI_KEY'", store_id= "abc123", secret_name = "open_AI_key"},
]

Last step – Continue reading

Cloudflare Snippets are now Generally Available

Program your traffic at the edge — fast, flexible, and free

Cloudflare Snippets are now generally available (GA) for all paid plans, giving you a fast, flexible way to control HTTP traffic using lightweight JavaScript “code rules” — at no extra cost.

Need to transform headers dynamically, fine-tune caching, rewrite URLs, retry failed requests, replace expired links, throttle suspicious traffic, or validate authentication tokens? Snippets provide a production-ready solution built for performance, security, and control.

With GA, we’re introducing a new code editor to streamline writing and testing logic. This summer, we’re also rolling out an integration with Secrets Store — enabling you to bind and manage sensitive values like API keys directly in Snippets, securely and at scale.

What are Snippets?

Snippets bring the power of JavaScript to Cloudflare Rules, letting you write logic that runs before a request reaches your origin or after a response returns from upstream. They’re ideal when built-in rule actions aren’t quite enough. While Cloudflare Rules let you define traffic logic without code, Snippets extend that model with greater flexibility for advanced scenarios.

Think of Snippets as the ultra-fast “code layer” of Cloudflare Rules: the Ruleset Engine evaluates your rules and invokes Continue reading

Network performance update: Developer Week 2025

As the Internet has become enmeshed in our everyday lives, so has our need for speed. No one wants to wait when adding shoes to our shopping carts, or accessing corporate assets from across the globe. And as the Internet supports more and more of our critical infrastructure, speed becomes more than just a measure of how quickly we can place a takeout order. It becomes the connective tissue between the systems that keep us safe, healthy, and organized. Governments, financial institutions, healthcare ecosystems, transit — they increasingly rely on the Internet. This is why at Cloudflare, building the fastest network is our north star.

We’re happy to announce that we are the fastest network in 48% of the top 1000 networks by 95th percentile TCP connection time between November 2024, and March 2025, up from 44% in September 2024.

In this post, we’re going to share with you how our network performance has changed since our last post in September 2024, and talk about what makes us faster than other networks. But first, let’s talk a little bit about how we get this data.

How does Cloudflare get this data?

It’s happened to all of us Continue reading

Introducing Workers Observability: logs, metrics, and queries – all in one place

We’re excited to announce Workers Observability – a new section in the Cloudflare Dashboard that allows you to query detailed log events across all Workers in your account to extract deeper insights.

In 2024, we set out to build the best first-party observability for any cloud platform. Since then, we’ve improved metrics reporting for all resources, launched Workers Logs to automatically ingest and store logs for Workers, and rebuilt real-time logs with improved filtering. However, observability insights have been limited to a single Worker.

Starting today, you can use Workers Observability to understand what is happening across all of your Workers:

Workers Metrics Dashboard (Beta): A single dashboard to view metrics and logs from all of your Workers
Query Builder (Beta): Construct structured queries to explore your logs, extract metrics from logs, create graphical and tabular visualizations, and save queries for faster future investigations.
Workers Logs: Now Generally Available, with a public API and improved invocation-based grouping.

Building queries

The Query Builder allows you to interact with your logs, and answer the “why” to any question you have. You can find it by navigating to Workers & Pages > Observability in the dashboard.

Using the Query Builder, you Continue reading

Cisco Champion 8th year in a row

Just got the news—I’ve been selected as a Cisco Champion for the 8th year in a row! Truly honored and thankful to be part of such an inspiring community. This year is full of milestones. While Cisco celebrates its 40th birthday, I’m marking 10 years as a CCIE and 15 years from my CCNA—a journey that’s brought endless learning, challenges, and growth throughout my career. Being a Cisco Champion year after year has been a great way to stay connected with amazing people, share knowledge, and keep up with the ever-evolving tech world. Here’s to Cisco’s 40 years of networking—and

The post Cisco Champion 8th year in a row appeared first on How Does Internet Work.

UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch

Compute engine makers can do all they want to bring the performance of their devices on par or even reasonably close to that of Nvidia’s various GPU accelerators, but until they have something akin to the NVLink and NVSwitch memory fabric that Nvidia uses to leverage the performance of many GPUs at bandwidths that dwarf PCI-Express switches and latencies that dwarf Ethernet interconnects, they can never catch up. …

UALink Fires First GPU Interconnect Salvo At Nvidia NVSwitch was written by Timothy Prickett Morgan at The Next Platform.

« Previous 1 … 37 38 39 40 41 … 3,816 Next »