Archive

Category Archives for "Networking"

Why I joined Cloudflare: to build world-class partnerships in EMEA

Cloudflare is not just another technology company. It’s a mission-driven force, committed to helping build a better Internet; one that is faster, safer, and more resilient. That mission is more critical than ever as organizations worldwide navigate an increasingly complex digital landscape, rife with cyber threats, regulatory challenges, and the need for scalable, cost-effective solutions.

In EMEA, that mission has special significance. The region is a patchwork of diverse markets, industries, and regulatory environments. It demands a partner-centric approach, one that empowers businesses of all sizes to harness Cloudflare’s comprehensive connectivity cloud platform to protect, connect, and accelerate their operations. That’s why I joined Cloudflare as VP of EMEA Partnerships.

A moment of inflection

Every great company has an inflection point, a moment when the market, the strategy, and the execution align to create unstoppable momentum. Cloudflare is at that moment now.

With record revenue growth, increasing traction among large customers, and an expanding suite of Zero Trust, AI, and network security solutions, Cloudflare is emerging as the partner of choice for enterprises and service providers across EMEA .

But what excites me most is the people, the opportunity to build a team in EMEA that is world-class in its expertise, Continue reading

How to get started with Calico Observability features

Kubernetes, by default, adopts a permissive networking model where all pods can freely communicate unless explicitly restricted using network policies. While this simplifies application deployment, it introduces significant security risks. Unrestricted network traffic allows workloads to interact with unauthorized destinations, increasing the potential for cyberattacks such as Remote Code Execution (RCE), DNS spoofing, and privilege escalation.

To better understand these problems, let’s examine a sample Kubernetes application: ANP Demo App.

This application comprises a deployment that spawns pods and a service that exposes them to external users in a similar situation like any real word workload which you will encounter in your environment.

If you open the application service before implementing any policies, the application reports the following messages:

  1. Container can reach the Internet – Without network policies, an attacker can use our container as an entry point by exploiting it with a vulnerability. This could allow them to exfiltrate data or establish remote control over the workload by leveraging its Internet access.
  2. Container can reach CoreDNS Pods – Kubernetes relies heavily on DNS, with records served using CoreDNS Pods. While communication between your Pods and CoreDNS is essential and not inherently a vulnerability, pairing it with unrestricted access to Continue reading

Congestion Avoidance in AI Fabric – Part III: Data Center Quantized Congestion Notification (DCQCN)

Data Center Quantized Congestion Notification (DCQCN) is a hybrid congestion control method. DCQCN brings together both Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) so that we can get high throughput, low latency, and lossless delivery across our AI fabric. In this approach, each mechanism plays a specific role in addressing different aspects of congestion, and together they create a robust flow-control system for RDMA traffic.


DCQCN tackles two main issues in large-scale RDMA networks:

1. Head-of-Line Blocking and Congestion Spreading: This is caused by PFC’s pause frames, which stop traffic across switches.

2. Throughput Reduction with ECN Alone: When the ECN feedback is too slow, packet loss may occur despite the rate adjustments.

DCQCN uses a two-tiered approach. It applies ECN early on to gently reduce the sending rate at the GPU NICs, and it uses PFC as a backup to quickly stop traffic on upstream switches (hop-by-hop) when congestion becomes severe.


How DCQCN Combines ECN and PFC

DCQCN carefully combines Explicit Congestion Notification (ECN) and Priority Flow Control (PFC) in the right sequence:


Early Action with ECN: When congestion begins to build up, the switch uses WRED thresholds (minimum and maximum) to mark packets. This signals the Continue reading

Congestion Avoidance in AI Fabric – Part II: Priority Flow Control (PFC)

Priority Flow Control (PFC) is a mechanism designed to prevent packet loss during network congestion by pausing traffic selectively based on priority levels. While the original IEEE 802.1Qbb standard operates at Layer 2, using the Priority Code Point (PCP) field in Ethernet headers, AI Fabrics rely on Layer 3 forwarding, where traditional Layer 2-based PFC is no longer applicable. To extend lossless behavior across routed (Layer 3) networks, DSCP-based PFC is used.

In DSCP-based PFC, the Differentiated Services Code Point (DSCP) field in the IP header identifies the traffic class or priority. Switches map specific DSCP values to internal traffic classes and queues. If congestion occurs on an ingress interface and a particular priority queue fills beyond a threshold, the switch can send a PFC pause frame back to the sender switch, instructing it to temporarily stop sending traffic of that class—just as in Layer 2 PFC, but now triggered based on Layer 3 classifications.

This behavior differs from Explicit Congestion Notification (ECN), which operates at Layer 3 as well but signals congestion by marking packets instead of stopping traffic. ECN acts on the egress port, informing the receiver to notify the sender to reduce the transmission rate over Continue reading

Why I joined Cloudflare as Chief People Officer — Kelly Russell

After years navigating the exhilarating world of high-growth tech, from Amazon to Twilio’s scaling journey and most recently Wiz’s rapid ascent, I’ve learned to recognize a truly special opportunity when I see one. That’s exactly what I have found at Cloudflare and why I’m thrilled to join. 

What drew me to Cloudflare was the unique combination of a powerful mission, a transparent and results-oriented culture, and the sheer scale of impact. Cloudflare isn’t just a technology company —  it’s a force for good, building a better Internet for everyone. This really resonates with my own values.  

Success starts with people

My career has been defined by building, scaling, and developing teams in dynamic environments.  I’ve witnessed the transformative power of a strong culture in driving hypergrowth. I’ve experienced the intensity and agility required to disrupt a market. These experiences have reinforced my belief that people are the heart of any successful company, and that a people-first strategy is critical for long-term impact. During my interview process at Cloudflare, this belief was clearly evident in every conversation. Cloudflare is a place where people can do their best work and be proud of the impact they are making. Powered Continue reading

Tech Bytes: Get Data Center Automation as-a-Service with Nokia Event-Driven Automation (Sponsored)

Nokia Event-Driven Automation (EDA) is a modern infrastructure automation platform that combines speed with reliability and simplicity. It makes data center network automation more trustable and easier to use, from small edge clouds to the largest AI fabrics. Today on Tech Bytes, we talk with Sam Arora from Nokia for more details about some of... Read more »

Developer Week 2025 wrap-up

As we conclude Developer Week 2025, we’re proud to reflect upon the capabilities we’ve added to our developer platform. It’s so rewarding to deliver products, features and tools that help developers build smarter and ship faster, and even more so hearing your responses throughout the week!

Our VP of Product, Rita Kozlov, kicked off Developer Week 2025 discussing the ever-evolving landscape of development, particularly in the age of AI. AI is no longer just a buzzword or a trope for a science-fiction future — in the realm of modern development, it’s a core tenet (and utility) of how we build, innovate, and solve problems. It’s influencing how and how frequently we ship code, as well as enabling anyone to write it.

It’s exciting to not only witness this technical revolution, but also to be building a platform that enables developers to be part of it. We want to hear your feedback and see what you build with the new capabilities — reach out to us on Discord or X.

Here’s a recap of our Developer Week 2025 announcements:

Monday, April 7

Announcement

Summary

Piecing together the Agent puzzle: MCP, authentication & authorization, and Durable Objects free tier 

Toolkit for AI Continue reading

🚀 One Month with Vibe Coding: Building Real Apps with AI Assistants

Over the past few months, Vibe coding has been gaining serious traction—and I couldn’t resist diving in myself. I’ve been using AI coding assistants for a while, but I wanted to go deeper and really test what these tools can do in a realistic, end-to-end software development project. So, I spent the last month building … Continue reading 🚀 One Month with Vibe Coding: Building Real Apps with AI Assistants

Congestion Avoidance in AI Fabric – Part I: Explicit Congestion Notification (ECN)

As explained in the preceding chapter, “Egress Interface Congestions,” both the Rail switch links to GPU servers and the inter-switch links can become congested during gradient synchronization. It is essential to implement congestion control mechanisms specifically designed for RDMA workloads in AI fabric back-end networks because congestion slows down the learning process and even a single packet loss may restart the whole training process.

This section begins by introducing Explicit Congestion Notification (ECN) and Priority-based Flow Control (PFC), two foundational technologies used in modern lossless Ethernet networks. ECN allows switches to mark packets, rather than dropping them, when congestion is detected, enabling endpoints to react proactively. PFC, on the other hand, offers per-priority flow control, which can pause selected traffic classes while allowing others to continue flowing.

Finally, we describe how Datacenter Quantized Congestion Notification (DCQCN) combines ECN and PFC to deliver a scalable and lossless transport mechanism for RoCEv2 traffic in AI clusters.

GPU-to-GPU RDMA Write Without Congestion


The figure 11-1 illustrates a standard Remote Direct Memory Access (RDMA) Write operation between two GPUs. This example demonstrates how GPU-0 on Host-1 transfers local gradients (∇₁ and ∇₂) from memory to GPU-0 on Host-2. Both GPUs use RDMA-capable NICs connected Continue reading

HN776: Security Platforms: Balancing Efficacy, Ops, and Emerging Threats (Sponsored)

Network security has evolved from stateful perimeter firewalls with maybe some IDS/IPS to a complex stack delivered as numerous unique tools, which often don’t talk to one another and may need to be operated by specialists. In this environment it’s hard to unify a security policy, troubleshoot problems, manage and operate tools, and respond effectively... Read more »

TNO024: Networks for AI and AI for Networks — A Dual Perspective with Aviz Networks (Sponsored)

On today’s show, we introduce Aviz Networks with Vishal Shukla, Co-Founder & CEO. Vishal and Aviz are making Networks for AI, and AI for Networks. Vishal explains how Aviz does this by offering AI Networking Unpacked. Designed for open-source and vendor-agnostic networking, AI Networking Unpacked works with existing network infrastructures. It also integrates with existing... Read more »

How we simplified NCMEC reporting with Cloudflare Workflows

Cloudflare plays a significant role in supporting the Internet’s infrastructure. As a reverse proxy by approximately 20% of all websites, we sit directly in the request path between users and the origin, helping to improve performance, security, and reliability at scale. Beyond that, our global network powers services like delivery, Workers, and R2 — making Cloudflare not just a passive intermediary, but an active platform for delivering and hosting content across the Internet.

Since Cloudflare’s launch in 2010, we have collaborated with the National Center for Missing and Exploited Children (NCMEC), a US-based clearinghouse for reporting child sexual abuse material (CSAM), and are committed to doing what we can to support identification and removal of CSAM content.

Members of the public, customers, and trusted organizations can submit reports of abuse observed on Cloudflare’s network. A minority of these reports relate to CSAM, which are triaged with the highest priority by Cloudflare’s Trust & Safety team. We will also forward details of the report, along with relevant files (where applicable) and supplemental information to NCMEC.

The process to generate and submit reports to NCMEC involves multiple steps, dependencies, and error handling, which quickly became complex under Continue reading

1 2 3 3,431