Archive

Category Archives for "Networking"

HN777: Overlay All the Things?

Today’s Heavy Networking is all about overlay technologies, their history, development, and current state, both from engineer and vendor perspectives. We discuss why the industry turns to overlays to solve problems, and look at overlay and segmentation approaches including VXLAN, SRv6, and EVPN. We also drill into the idea that EVPN could become the standard... Read more »

NVIDIA GTC 2025 Wrap-Up: 18 New Products to Watch

If you follow the tech news, you have read a lot about NVIDIA and its graphics processing units (GPUs). However, it would be incorrect to conclude that NVIDIA is solely focused on GPUs. My biggest revelation from NVIDIA’s GTC 2025 conference last month was that NVIDIA innovates across compute, networking and storage. Most of these innovations are all about AI, but gamers should not be concerned; there is a new RTX chip for you. The new announcements and key technologies that were the spotlight of CEO GeForce RTX 5090 will be the new high-end desktop GPU for gamers and creative professionals. (Did you know that RTX stands for Ray Tracing Texel Extreme? Continue reading

From Python to Go 019. Interaction With Applications Via REST API.

Hello my friend,

So far we’ve covered all means to interact with network devices, which are meaningful in our opinion: SSH, NETCONF/YANG, and GNMI/YANG. There is one more protocol, which exists for managing network devices, which is called RESTCONF, which is application of REST API to network devices. From our experience, its support across network vendors is very limited; therefore, we don’t cover it. However, REST API itself is immensely important, as it is still the most widely used protocol for applications to talk to each other. And this is the focus for today’s blog.

I See Everywhere Stop Learning Code, Why Do You Teach It?

Generative AI, Agentic AI, all other kinds of AI is absolutely useful things. The advancements there are very quick and we ourselves using them in our projects. At the same time, if you don’t know how to code, how to solve algorithmic tasks, how can you reason if the solution provided by AI is correct? If that optimal? And moreover, when it breaks, because every software breaks sooner or later, how can you fix it? That’s why we believe it is absolutely important to learn software development, tools and algorithms. Perhaps, more Continue reading

How To Read a Traceroute for Network Troubleshooting

The traceroute tool is one of the most valuable yet straightforward diagnostic utilities available for network troubleshooting. Built into virtually every operating system, traceroute runs a connection test from one computer to another device, showing each “hop” the data takes between network devices. This comprehensive guide will help you understand how traceroute works, interpret its results and recognize common network problems it can reveal. Traceroute: Understanding What It Does To see traceroute in action, we can begin with a simple example of running a traceroute from your computer to Catchpoint’s servers. The specific results will be different for each person. However, in most cases, the results will show you around four to 20 “hops” that packets take to get from your computer to Catchpoint’s servers and back. The first one would likely be your local router, and from there, the data will take multiple “hops” through your internal network and out through your internet service provider (ISP) and over the internet, before finally reaching Catchpoint’s servers. Figure 1 shows an example of what you might see on the command prompt of a Windows computer. Figure 1: Image of a traceroute command and the results generated. Understanding how to run this Continue reading

N4N022: SNMP Fundamentals

Following last week’s introduction to network monitoring, we discuss the Simple Network Management Protocol (SNMP), one of the most implemented types of network monitoring. We discuss how it is organized, operations that SNMP can perform, and versions of SNMP. This week’s bonus conversation is a discussion on the future for SNMP. Episode Links: MIB tree... Read more »

Rant: You Should Have Written a Book

I apologize for the rant; I have to vent my frustration with people whose quantity of opinions seems to be exceeding their experience (or maybe they’re coming from an alternate universe with different laws of physics, which would be way cool but also unlikely). You’ve been warned; please feel free to move on or skip the rant part of the blog post.

Rant mode: ON

This is the (unedited) gem I received after making some of my EVPN videos public:

Why I joined Cloudflare: to build world-class partnerships in EMEA

Cloudflare is not just another technology company. It’s a mission-driven force, committed to helping build a better Internet; one that is faster, safer, and more resilient. That mission is more critical than ever as organizations worldwide navigate an increasingly complex digital landscape, rife with cyber threats, regulatory challenges, and the need for scalable, cost-effective solutions.

In EMEA, that mission has special significance. The region is a patchwork of diverse markets, industries, and regulatory environments. It demands a partner-centric approach, one that empowers businesses of all sizes to harness Cloudflare’s comprehensive connectivity cloud platform to protect, connect, and accelerate their operations. That’s why I joined Cloudflare as VP of EMEA Partnerships.

A moment of inflection

Every great company has an inflection point, a moment when the market, the strategy, and the execution align to create unstoppable momentum. Cloudflare is at that moment now.

With record revenue growth, increasing traction among large customers, and an expanding suite of Zero Trust, AI, and network security solutions, Cloudflare is emerging as the partner of choice for enterprises and service providers across EMEA .

But what excites me most is the people, the opportunity to build a team in EMEA that is world-class in its expertise, Continue reading

Internet Governance – The End of Multi-Stakeholderism?

The recent erratic moves by the US President to initiate a trade war on a global scale will have far-reaching implications beyond stock markets and will inevitably include the digital world and what we refer to as Internet Governance. The US moves on the unilateral imposition of tariffs can be interpreted as a vote of no confidence in global trade and open markets by the US, and a resurgence of a theme of strategic national self-reliance in all areas of economic activity, including the digital realm. The tenets of Multi-Stakeholderism, the foundation of Internet Governance, are crumbling.

How to get started with Calico Observability features

Kubernetes, by default, adopts a permissive networking model where all pods can freely communicate unless explicitly restricted using network policies. While this simplifies application deployment, it introduces significant security risks. Unrestricted network traffic allows workloads to interact with unauthorized destinations, increasing the potential for cyberattacks such as Remote Code Execution (RCE), DNS spoofing, and privilege escalation.

To better understand these problems, let’s examine a sample Kubernetes application: ANP Demo App.

This application comprises a deployment that spawns pods and a service that exposes them to external users in a similar situation like any real word workload which you will encounter in your environment.

If you open the application service before implementing any policies, the application reports the following messages:

  1. Container can reach the Internet – Without network policies, an attacker can use our container as an entry point by exploiting it with a vulnerability. This could allow them to exfiltrate data or establish remote control over the workload by leveraging its Internet access.
  2. Container can reach CoreDNS Pods – Kubernetes relies heavily on DNS, with records served using CoreDNS Pods. While communication between your Pods and CoreDNS is essential and not inherently a vulnerability, pairing it with unrestricted access to Continue reading

Congestion Avoidance in AI Fabric – Part III: Data Center Quantized Congestion Notification (DCQCN)

Data Center Quantized Congestion Notification (DCQCN) is a hybrid congestion control method. DCQCN brings together both Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) so that we can get high throughput, low latency, and lossless delivery across our AI fabric. In this approach, each mechanism plays a specific role in addressing different aspects of congestion, and together they create a robust flow-control system for RDMA traffic.


DCQCN tackles two main issues in large-scale RDMA networks:

1. Head-of-Line Blocking and Congestion Spreading: This is caused by PFC’s pause frames, which stop traffic across switches.

2. Throughput Reduction with ECN Alone: When the ECN feedback is too slow, packet loss may occur despite the rate adjustments.

DCQCN uses a two-tiered approach. It applies ECN early on to gently reduce the sending rate at the GPU NICs, and it uses PFC as a backup to quickly stop traffic on upstream switches (hop-by-hop) when congestion becomes severe.


How DCQCN Combines ECN and PFC

DCQCN carefully combines Explicit Congestion Notification (ECN) and Priority Flow Control (PFC) in the right sequence:


Early Action with ECN: When congestion begins to build up, the switch uses WRED thresholds (minimum and maximum) to mark packets. This signals the Continue reading

Congestion Avoidance in AI Fabric – Part II: Priority Flow Control (PFC)

Priority Flow Control (PFC) is a mechanism designed to prevent packet loss during network congestion by pausing traffic selectively based on priority levels. While the original IEEE 802.1Qbb standard operates at Layer 2, using the Priority Code Point (PCP) field in Ethernet headers, AI Fabrics rely on Layer 3 forwarding, where traditional Layer 2-based PFC is no longer applicable. To extend lossless behavior across routed (Layer 3) networks, DSCP-based PFC is used.

In DSCP-based PFC, the Differentiated Services Code Point (DSCP) field in the IP header identifies the traffic class or priority. Switches map specific DSCP values to internal traffic classes and queues. If congestion occurs on an ingress interface and a particular priority queue fills beyond a threshold, the switch can send a PFC pause frame back to the sender switch, instructing it to temporarily stop sending traffic of that class—just as in Layer 2 PFC, but now triggered based on Layer 3 classifications.

This behavior differs from Explicit Congestion Notification (ECN), which operates at Layer 3 as well but signals congestion by marking packets instead of stopping traffic. ECN acts on the egress port, informing the receiver to notify the sender to reduce the transmission rate over Continue reading

Why I joined Cloudflare as Chief People Officer — Kelly Russell

After years navigating the exhilarating world of high-growth tech, from Amazon to Twilio’s scaling journey and most recently Wiz’s rapid ascent, I’ve learned to recognize a truly special opportunity when I see one. That’s exactly what I have found at Cloudflare and why I’m thrilled to join. 

What drew me to Cloudflare was the unique combination of a powerful mission, a transparent and results-oriented culture, and the sheer scale of impact. Cloudflare isn’t just a technology company —  it’s a force for good, building a better Internet for everyone. This really resonates with my own values.  

Success starts with people

My career has been defined by building, scaling, and developing teams in dynamic environments.  I’ve witnessed the transformative power of a strong culture in driving hypergrowth. I’ve experienced the intensity and agility required to disrupt a market. These experiences have reinforced my belief that people are the heart of any successful company, and that a people-first strategy is critical for long-term impact. During my interview process at Cloudflare, this belief was clearly evident in every conversation. Cloudflare is a place where people can do their best work and be proud of the impact they are making. Powered Continue reading

Tech Bytes: Get Data Center Automation as-a-Service with Nokia Event-Driven Automation (Sponsored)

Nokia Event-Driven Automation (EDA) is a modern infrastructure automation platform that combines speed with reliability and simplicity. It makes data center network automation more trustable and easier to use, from small edge clouds to the largest AI fabrics. Today on Tech Bytes, we talk with Sam Arora from Nokia for more details about some of... Read more »