NetworkingNexus.net

🚀 One Month with Vibe Coding: Building Real Apps with AI Assistants

Over the past few months, Vibe coding has been gaining serious traction—and I couldn’t resist diving in myself. I’ve been using AI coding assistants for a while, but I wanted to go deeper and really test what these tools can do in a realistic, end-to-end software development project. So, I spent the last month building … Continue reading →

Congestion Avoidance in AI Fabric – Part I: Explicit Congestion Notification (ECN)

As explained in the preceding chapter, “Egress Interface Congestions,” both the Rail switch links to GPU servers and the inter-switch links can become congested during gradient synchronization. It is essential to implement congestion control mechanisms specifically designed for RDMA workloads in AI fabric back-end networks because congestion slows down the learning process and even a single packet loss may restart the whole training process.

This section begins by introducing Explicit Congestion Notification (ECN) and Priority-based Flow Control (PFC), two foundational technologies used in modern lossless Ethernet networks. ECN allows switches to mark packets, rather than dropping them, when congestion is detected, enabling endpoints to react proactively. PFC, on the other hand, offers per-priority flow control, which can pause selected traffic classes while allowing others to continue flowing.

Finally, we describe how Datacenter Quantized Congestion Notification (DCQCN) combines ECN and PFC to deliver a scalable and lossless transport mechanism for RoCEv2 traffic in AI clusters.

GPU-to-GPU RDMA Write Without Congestion

The figure 11-1 illustrates a standard Remote Direct Memory Access (RDMA) Write operation between two GPUs. This example demonstrates how GPU-0 on Host-1 transfers local gradients (∇₁ and ∇₂) from memory to GPU-0 on Host-2. Both GPUs use RDMA-capable NICs connected Continue reading

Professional Corporate Network Simulation in Packet Tracer

I am pleased to introduce the work of one of my students, who developed a […]

The post Professional Corporate Network Simulation in Packet Tracer first appeared on Brezular's Blog.

HN776: Security Platforms: Balancing Efficacy, Ops, and Emerging Threats (Sponsored)

Network security has evolved from stateful perimeter firewalls with maybe some IDS/IPS to a complex stack delivered as numerous unique tools, which often don’t talk to one another and may need to be operated by specialists. In this environment it’s hard to unify a security policy, troubleshoot problems, manage and operate tools, and respond effectively... Read more »

TNO024: Networks for AI and AI for Networks — A Dual Perspective with Aviz Networks (Sponsored)

On today’s show, we introduce Aviz Networks with Vishal Shukla, Co-Founder & CEO. Vishal and Aviz are making Networks for AI, and AI for Networks. Vishal explains how Aviz does this by offering AI Networking Unpacked. Designed for open-source and vendor-agnostic networking, AI Networking Unpacked works with existing network infrastructures. It also integrates with existing... Read more »

Weekend Reads 041125

Amazon was a major player in ICANN’s last top-level domain (TLD) expansion round, serving as a Registry Operator for over 50 TLDs based on over 70 strings for which they originally applied.

A recent influx of /16 IPv4 address blocks has sent ripples through the secondary market, triggering a notable decline in average sale prices for the largest address sizes.

It is best to appreciate the challenge. AI data centers are projected to consume approximately two to three percent of U.S. electrical consumption this year alone, with expectations of continued growth now a given.

Interestingly, a fiber cut to an undersea fiber doesn’t cause as much harm as most people imagine. This map that shows all of the current submarine cable routes. There are a huge amount of redundant routes to most of the world.

Is it actually possible to run a team without lying? Steven Gaffney, author of the book, “Just Be Honest”, joins Johna and John to talk about why being honest is harder than it sounds–and how (and why) to do it anyway.

Simple, scalable, and global: Containers are coming to Cloudflare Workers in June 2025

It is almost the end of Developer Week and we haven’t talked about containers: until now. As some of you may know, we’ve been working on a container platform behind the scenes for some time.

In late June, we plan to release Containers in open beta, and today we’ll give you a sneak peek at what makes it unique.

Workers are the simplest way to ship software around the world with little overhead. But sometimes you need to do more. You might want to:

Run user-generated code in any language
Execute a CLI tool that needs a full Linux environment
Use several gigabytes of memory or multiple CPU cores
Port an existing application from AWS, GCP, or Azure without a major rewrite

Cloudflare Containers let you do all of that while being simple, scalable, and global.

Through a deep integration with Workers and an architecture built on Durable Objects, Workers can be your:

API Gateway: Letting you control routing, authentication, caching, and rate-limiting before requests reach a container
Service Mesh: Creating private connections between containers with a programmable routing layer
Orchestrator: Allowing you to write custom scheduling, scaling, and health checking logic for your containers

Instead Continue reading

Startup spotlight: building AI agents and accelerating innovation with Cohort #5

With quick access to flexible infrastructure and innovative AI tools, startups are able to deploy production-ready applications with speed and efficiency. Cloudflare plays a pivotal role for countless applications, empowering founders and engineering teams to build, scale, and accelerate their innovations with ease — and without the burden of technical overhead. And when applicable, initiatives like our Startup Program and Workers Launchpad offer the tooling and resources that further fuel these ambitious projects.

Cloudflare recently announced AI agents, allowing developers to leverage Cloudflare to deploy agents to complete autonomous tasks. We’re already seeing some great examples of startups leveraging Cloudflare as their platform of choice to invest in building their agent infrastructure. Read on to see how a few up-and-coming startups are building their AI agent platforms, powered by Cloudflare.

Lamatic AI built a scalable AI agent platform using Workers for Platform

Founded in 2023, Lamatic.ai empowers SaaS startups to seamlessly integrate intelligent AI agents into their products. Lamatic.ai simplifies the deployment of AI agents by offering a fully managed lifecycle with scalability and security in mind. SaaS providers have been leveraging Lamatic to replatform their AI workflows via a no-code visual builder to reduce technical debt Continue reading

A global virtual private cloud for building secure cross-cloud apps on Cloudflare Workers

Today, we’re sharing a preview of a new feature that makes it easier to build cross-cloud apps: Workers VPC.

Workers VPC is our take on the traditional virtual private cloud (VPC), modernized for a network and compute that isn’t tied to a single cloud region. And we’re complementing it with Workers VPC Private Links to make building across clouds easier. Together, they introduce two new capabilities to Workers:

A way to group your apps’ resources on Cloudflare into isolated environments, where only resources within a Workers VPC can access one another, allowing you to secure and segment app-to-app traffic (a “Workers VPC”).
A way to connect a Workers VPC to a legacy VPC in a public or private cloud, enabling your Cloudflare resources to access your resources in private networks and vice versa, as if they were in a single VPC (the “Workers VPC Private Link”).

^{Workers VPC and Workers VPC Private Link enable bidirectional connectivity between Cloudflare and external clouds}

When linked to an external VPC, Workers VPC makes the underlying resources directly addressable, so that application developers can think at the application layer, without dropping down to the network layer. Think of this like a Continue reading

Startup Program update: empowering every stage of the startup journey

During Cloudflare’s Birthday Week in September 2024, we introduced a revamped Startup Program designed to make it easier for startups to adopt Cloudflare through a new credits system. This update focused on better aligning the program with how startups and developers actually consume Cloudflare, by providing them with clearer insight into their projected usage, especially as they approach graduation from the program.

Today, we’re excited to announce an expansion to that program: new credit tiers that better match startups at every stage of their journey. But before we dive into what’s new, let’s take a quick look at what the Startup Program is and why it exists.

A refresher: what is the Startup Program?

Cloudflare for Startups provides credits to help early-stage companies build the next big idea on our platform. Startups accepted into the program receive credits valid for one year or until they’re fully used, whichever comes first.

Beyond credits, the program includes access to up to three domains with enterprise-level services, giving startups the same advanced tools we provide to large companies to protect and accelerate their most critical applications.

We know that building a startup is expensive, and Cloudflare is uniquely positioned to support the full-stack Continue reading

How we simplified NCMEC reporting with Cloudflare Workflows

Cloudflare plays a significant role in supporting the Internet’s infrastructure. As a reverse proxy by approximately 20% of all websites, we sit directly in the request path between users and the origin, helping to improve performance, security, and reliability at scale. Beyond that, our global network powers services like delivery, Workers, and R2 — making Cloudflare not just a passive intermediary, but an active platform for delivering and hosting content across the Internet.

Since Cloudflare’s launch in 2010, we have collaborated with the National Center for Missing and Exploited Children (NCMEC), a US-based clearinghouse for reporting child sexual abuse material (CSAM), and are committed to doing what we can to support identification and removal of CSAM content.

Members of the public, customers, and trusted organizations can submit reports of abuse observed on Cloudflare’s network. A minority of these reports relate to CSAM, which are triaged with the highest priority by Cloudflare’s Trust & Safety team. We will also forward details of the report, along with relevant files (where applicable) and supplemental information to NCMEC.

The process to generate and submit reports to NCMEC involves multiple steps, dependencies, and error handling, which quickly became complex under Continue reading

Hedge 266: SR/MPLS

When most people think of segment routing (SR), they think of SRv6–using IPv6 addresses as segment IDs, and breaking the least significant /64 to create microsids for service differentiation. This is not, however, the only way to implement and deploy SR. The alternative is SR using MPLS labels, or SR/MPLS. Hemant Sharma joins Tom Ammon and Russ White to discuss SR/MPLS, why operators might choose MPLS over IPv6 SIDs, and other topics related to SR/MPLS.

download

You can find Hermant’s recent book on SR/MPLS here.

The AI Factory: 12,000 Years In The Making, And Absolutely Inevitable

When it comes to artificial intelligence, context is everything. The same thing holds true for human intelligence, so it stands to reason that it translates to AI since we created it in our own image. …

The AI Factory: 12,000 Years In The Making, And Absolutely Inevitable was written by Timothy Prickett Morgan at The Next Platform.

A next-generation Certificate Transparency log built on Cloudflare Workers

Any public certification authority (CA) can issue a certificate for any website on the Internet to allow a webserver to authenticate itself to connecting clients. Take a moment to scroll through the list of trusted CAs for your web browser (e.g., Chrome). You may recognize (and even trust) some of the names on that list, but it should make you uncomfortable that any CA on that list could issue a certificate for any website, and your browser would trust it. It’s a castle with 150 doors.

Certificate Transparency (CT) plays a vital role in the Web Public Key Infrastructure (WebPKI), the set of systems, policies, and procedures that help to establish trust on the Internet. CT ensures that all website certificates are publicly visible and auditable, helping to protect website operators from certificate mis-issuance by dishonest CAs, and helping honest CAs to detect key compromise and other failures.

In this post, we’ll discuss the history, evolution, and future of the CT ecosystem. We’ll cover some of the challenges we and others have faced in operating CT logs, and how the new static CT API log design lowers the bar for operators, helping to ensure that Continue reading

Workers AI gets a speed boost, batch workload support, more LoRAs, new models, and a refreshed dashboard

Since the launch of Workers AI in September 2023, our mission has been to make inference accessible to everyone.

Over the last few quarters, our Workers AI team has been heads down on improving the quality of our platform, working on various routing improvements, GPU optimizations, and capacity management improvements. Managing a distributed inference platform is not a simple task, but distributed systems are also what we do best. You’ll notice a recurring theme from all these announcements that has always been part of the core Cloudflare ethos — we try to solve problems through clever engineering so that we are able to do more with less.

Today, we’re excited to introduce speculative decoding to bring you faster inference, an asynchronous batch API for large workloads, and expanded LoRA support for more customized responses. Lastly, we’ll be recapping some of our newly added models, updated pricing, and unveiling a new dashboard to round out the usability of the platform.

Speeding up inference by 2-4x with speculative decoding and more

We’re excited to roll out speed improvements to models in our catalog, starting with the Llama 3.3 70b model. These improvements include speculative decoding, prefix caching, an updated inference backend, Continue reading

Google Woos HPC Centers With Fast CPUs And Networks

The HPC centers of the world like fast networks and compute, but they are also always working under budget constraints unlike their AI peers out there in the enterprise, where money seems to be unlimited to what sometimes looks like an irrationally exuberant extent. …

Google Woos HPC Centers With Fast CPUs And Networks was written by Timothy Prickett Morgan at The Next Platform.

TL011: Getting the Entire Team to Speak

On today’s episode we’re joined by Daniel Ward to get a sneak preview of his talk on Getting the Entire Team to Speak, which he’ll give at DevOpsDay Austin. His addresses the challenges of getting people to speak up. Why is this needed? Getting input from everyone on a team lets people raise issues and... Read more »

Dropped packet notifications with Cisco 8000 Series Routers

The availability of the Cisco IOS XR Release 25.1.1 brings sFlow dropped packet notification support to Cisco 8000 series routers, making it easy to capture and analyze packets dropped at router ingress, aiding in understanding blocked traffic types, identifying potential security threats, and optimizing network performance.

sFlow Configuration for Traffic Monitoring and Analysis describes the steps to enable sFlow and configure packet sampling and interface counter export from a Cisco 8000 Series router to a remote sFlow analyzer.

Note: Devices using NetFlow or IPFIX must transition to sFlow for regular sampling before utilizing the dropped packet feature, ensuring compatibility and consistency in data analysis.

Router(config)#monitor-session monitor1
Router(config)#destination sflow EXP-MAP
Router(config)#forward-drops rx

Configure a monitor-session with the new destination sflow option to export dropped packet notifications (which include ingress interface, drop reason, and header of dropped packet) to the configured sFlow analyzer.

Cisco lists the following benefits of streaming dropped packets in the configuration guide:

Enhanced Network Visibility: Captures and forwards dropped packets to an sFlow collector, providing detailed insights into packet loss and improving diagnostic capabilities.
Comprehensive Analysis: Allows for simultaneous analysis of regular and dropped packet flows, offering a holistic view of network performance.
Troubleshooting: Empowers Continue reading

Making Super Slurper 5x faster with Workers, Durable Objects, and Queues

Super Slurper is Cloudflare’s data migration tool that is designed to make large-scale data transfers between cloud object storage providers and Cloudflare R2 easy. Since its launch, thousands of developers have used Super Slurper to move petabytes of data from AWS S3, Google Cloud Storage, and other S3-compatible services to R2.

But we saw an opportunity to make it even faster. We rearchitected Super Slurper from the ground up using our Developer Platform — building on Cloudflare Workers, Durable Objects, and Queues — and improved transfer speeds by up to 5x. In this post, we’ll dive into the original architecture, the performance bottlenecks we identified, how we solved them, and the real-world impact of these improvements.

Initial architecture and performance bottlenecks

Super Slurper originally shared its architecture with SourcingKit, a tool built to bulk import images from AWS S3 into Cloudflare Images. SourcingKit was deployed on Kubernetes and ran alongside the Images service. When we started building Super Slurper, we split it into its own Kubernetes namespace and introduced a few new APIs to make it easier to use for the object storage use case. This setup worked well and helped thousands of developers move data to Continue reading

Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines

Today, we’re launching the open beta of Pipelines, our streaming ingestion product. Pipelines allows you to ingest high volumes of structured, real-time data, and load it into our object storage service, R2. You don’t have to manage any of the underlying infrastructure, worry about scaling shards or metadata services, and you pay for the data processed (and not by the hour). Anyone on a Workers paid plan can start using it to ingest and batch data — at tens of thousands of requests per second (RPS) — directly into R2.

But this is just the tip of the iceberg: you often want to transform the data you’re ingesting, hydrate it on-the-fly from other sources, and write it to an open table format (such as Apache Iceberg), so that you can efficiently query that data once you’ve landed it in object storage.

The good news is that we’ve thought about that too, and we’re excited to announce that we’ve acquired Arroyo, a cloud-native, distributed stream processing engine, to make that happen.

With Arroyo and our just announced R2 Data Catalog, we’re getting increasingly serious about building a data platform that allows you to ingest data across the planet, store Continue reading

« Previous 1 … 20 21 22 23 24 … 3,799 Next »