Archive

Category Archives for "Networking"

BGP Updates cin 2025

The first part of this annual report on BGP for the year 2024 looked at the size of the routing table and some projections of table growth for both IPv4 and IPv6. However, the scalability of BGP as the Internet’s routing protocol is not just dependant on the number of prefixes carried in the routing table. BGP protocol behaviour in the form of dynamic routing updates are also part of this story. This second part of this report looks at the profile of BGP updates across 2023 to assess whether the stability of the routing system, as measured by the level of BGP update activity, is changing.

I just passed the AWS Advanced Networking Specialty

It’s been almost exactly 11 years since I passed the R&S CCIE lab in Brussels, and now it was time to go with something more cloudy 🙂 I just passed the AWS Advanced Networking Specialty, and unlike CCIE I did this one on my first attempt. I did stay pretty much in networking during the last years, but shifted to significantly less Cisco and quite much into cloud architecture – mostly AWS. Since networking is the heart of every cloud architecture, and after 4 years of hands-on work with complex AWS networking projects, I decided it was time to validate

The post I just passed the AWS Advanced Networking Specialty appeared first on How Does Internet Work.

Human Cognition Can’t Keep Up With Modern Networks. What’s Next?

Sanil Nambiar, client engagement lead, AI for networks, at IBM: Assembling the infrastructure organizations will need for AI. “The strategy, obviously, is hybrid cloud, data and AI and automation working together as an architecture,” Nambiar told me in this episode of The New Stack Makers. IBM has invested in what he calls “three foundational platforms” because each offers capabilities essential to AI infrastructure. Red Hat, a hybrid cloud platform, is needed “for that consistent runtime across on-prem and cloud,” he said. HashiCorp offers “life cycle control and policy-driven automation.” And Confluent is for “real-time, contextual, trustworthy data access for AI.” All of these platforms are needed, Nambiar said, because  “AI does not sit on top of chaos and magically fix it. You really need environments which are consistent, infrastructure that is programmable, data that moves in real time.” The Core Challenges of Modern Network Operations The new complexity AI introduces has added to the challenges networking Continue reading

Sidecarless mTLS in Kubernetes: How Istio Ambient Mesh and ztunnel Enable Zero Trust

Encrypting internal traffic and enforcing mutual (mTLS), a form of TLS in which both the client and server authenticate each other using X.509 certificates., has transitioned from a “nice-to-have” to a hard requirement, especially in Kubernetes environments where everything can talk to everything else by default. Whether your objectives are regulatory compliance, or simply aligning to the principles of Zero Trust, the goal is the same: to ensure every connection is encrypted, authenticated, and authorized.

Delivering Cluster-Wide mTLS Without Sidecars

The word ‘service mesh’ is bandied about as the ideal solution for implementing zero-trust security but it comes at a price often too high for organizations to accept. In addition to a steep learning curve, deploying a service mesh with a sidecar proxy in every pod scales poorly, driving up CPU and memory consumption and creating ongoing maintenance challenges for cluster operators.

Istio Ambient Mode addresses these pain points by decoupling the mesh from the application and splitting the service mesh into two distinct layers: mTLS and L7 traffic management, neither of which needs to run as a sidecar on a pod. By separating these domains, Istio allows platform engineers to implement mTLS cluster-wide without the complexity of Continue reading

UET Congestion Management: CCC Base RTT

Calculating Base RTT

[Edit: January 7 2026, RTT role in CWND adjustment process]

As described in the previous section, the Bandwidth-Delay Product (BDP) is a baseline value used when setting the maximum size (MaxWnd) of the Congestion Window (CWND). The BDP is calculated by multiplying the lowest link speed among the source and destination nodes by the Base Round-Trip Time (Base_RTT).

In addition to its role in BDP calculation, Base_RTT plays a key role in the CWND adjustment process. During operation, the RTT measured for each packet is compared against the Base_RTT. If the measured RTT is significantly higher than the Base_RTT, the CWND is reduced. If the RTT is close to or lower than the Base_RTT, the CWND is allowed to increase.

This adjustment process is described in more detail in the upcoming sections.

The config_base_rtt parameter represents the RTT of the longest path between sender and receiver when no other packets are in flight. In other words, it reflects the minimum RTT under uncongested conditions. Figure 6-7 illustrates the individual delay components that together form the RTT.

Serialization Delay: The network shown in Figure 6-7 supports jumbo frames with an MTU of 9216 bytes. Serialization delay is measured Continue reading

A closer look at a BGP anomaly in Venezuela

As news unfolds surrounding the U.S. capture and arrest of Venezuelan leader Nicolás Maduro, a cybersecurity newsletter examined Cloudflare Radar data and took note of a routing leak in Venezuela on January 2.

We dug into the data. Since the beginning of December there have been eleven route leak events, impacting multiple prefixes, where AS8048 is the leaker. Although it is impossible to determine definitively what happened on the day of the event, this pattern of route leaks suggests that the CANTV (AS8048) network, a popular Internet Service Provider (ISP) in Venezuela, has insufficient routing export and import policies. In other words, the BGP anomalies observed by the researcher could be tied to poor technical practices by the ISP rather than malfeasance.

In this post, we’ll briefly discuss Border Gateway Protocol (BGP) and BGP route leaks, and then dig into the anomaly observed and what may have happened to cause it. 

Background: BGP route leaks

First, let’s revisit what a BGP route leak is. BGP route leaks cause behavior similar to taking the wrong exit off of a highway. While you may still make it to your destination, the path may be slower and come with delays you Continue reading

BGP in 2025

At the start of each year, it’s been my practice to report on the behaviour of the Internet’s inter-domain routing system over the previous 12 months, looking in some detail at some metrics from the routing system that can show the essential shape and behaviour of the underlying interconnection fabric of the Internet.

Using eBPF to load-balance traffic across UDP sockets with Go

Akvorado collects sFlow and IPFIX flows over UDP. Because UDP does not retransmit lost packets, it needs to process them quickly. Akvorado runs several workers listening to the same port. The kernel should load-balance received packets fairly between these workers. However, this does not work as expected. A couple of workers exhibit high packet loss:

$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
> | sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total{listener="0.0.0.0:2055",worker="0"} 0
packets_total{listener="0.0.0.0:2055",worker="1"} 0
packets_total{listener="0.0.0.0:2055",worker="2"} 0
packets_total{listener="0.0.0.0:2055",worker="3"} 1.614933572278264e+15
packets_total{listener="0.0.0.0:2055",worker="4"} 0
packets_total{listener="0.0.0.0:2055",worker="5"} 0
packets_total{listener="0.0.0.0:2055",worker="6"} 9.59964121598348e+14
packets_total{listener="0.0.0.0:2055",worker="7"} 0

eBPF can help by implementing an alternate balancing algorithm. 🐝

Options for load-balancing

There are three methods to load-balance UDP packets across workers:

  1. One worker receives the packets and dispatches them to the other workers.
  2. All workers share the same socket.
  3. Each worker has its own socket, listening to the same port, with the SO_REUSEPORT socket option.

SO_REUSEPORT option

Tom Hebert added the SO_REUSEPORT socket Continue reading

UET Congestion Management: Congestion Control Context

Congestion Control Context

Updated 5.1.2026: Added CWND computation example into figure. Added CWND cmputaiton into text.
Updated 13.1.2026: Deprectade by: Ultra Ethernet: Congestion Control Context 

Ultra Ethernet Transport (UET) uses a vendor-neutral, sender-specific congestion window–based congestion control mechanism together with flow-based, adjustable entropy-value (EV) load balancing to manage incast, outcast, local, link, and network congestion events. Congestion control in UET is implemented through coordinated sender-side and receiver-side functions to enforce end-to-end congestion control behavior.

On the sender side, UET relies on the Network-Signaled Congestion Control (NSCC) algorithm. Its main purpose is to regulate how quickly packets are transmitted by a Packet Delivery Context (PDC). The sender adapts its transmission window based on round-trip time (RTT) measurements and Explicit Congestion Notification (ECN) Congestion Experienced (CE) feedback conveyed through acknowledgments from the receiver.

On the receiver side, Receiver Credit-based Congestion Control (RCCC) limits incast pressure by issuing credits to senders. These credits define how much data a sender is permitted to transmit toward the receiver. The receiver also observes ECN-CE markings in incoming packets to detect path congestion. When congestion is detected, the receiver can instruct the sender to change the entropy value, allowing traffic to be Continue reading

Focus is In for 2026

Hey everyone. It’s January 1 again, which means it’s time for me to own up to the fact that I wrote five posts in 2025. Two of those were about AI. Not surprising given that everyone was talking about it. But that seemed to be all I was talking about. What else was I doing instead?

  • I upped my running amount drastically. I covered over 1,600 miles this year. I ran another half marathon distance for the first time in four years. I feel a lot better about my health and my consistency because now running is something I prioritize. I don’t think I’m going to run quite so much in 2026 but you never know.
  • I revitalized a podcast. We relaunched Security Boulevard with big help from my coworker Corey Dirrig. We’ve got a great group of hosts that discuss weekly security topics. You should totally check it out.
  • I’m also doing more with things like Techstrong Gang and other Futurum Group media. That’s in addition to the weekly Tech Field Day Rundown I host with Alastair Cooke. Lots of video!
  • For those that follow my Scouting journey, I was asked to be an Assistant District Commissioner with the Continue reading

Getting DNS Right: Principles for Effective Monitoring

This is the second of two parts. Read Part 1: How to Get DNS Right: A Guide to Common Failure Modes Monitoring DNS is not simply a matter of checking whether a record resolves. A comprehensive approach follows four key principles: Test from multiple networks and regions to avoid blind spots. Validate both correctness and speed, since slow answers can harm user flows even when technically valid. Measure continuously, not periodically, because many issues manifest as short-lived or regionalized incidents. Compare control plane changes to real-world propagation patterns to ensure updates are applied as intended. DNS monitoring is most effective when it targets specific signals that reveal problems with record integrity, server behavior and real-world performance. The key groups of tests: DNS mapping. DNS record validation. DNS performance measurements. DNS Mapping Mapping tests verify that users are directed to an appropriate DNS server based on location. This matters because the closest healthy server usually provides the fastest response. If a user’s request is sent across a country or to another continent, latency increases and resilience decreases. Different managed DNS providers use different methods to determine which server responds to a query. Many compare the geographic location of the querying IP Continue reading

The Rise of AI Agents and the Reinvention of Kubernetes: Ratan Tipirneni’s 2026 Outlook

Prediction: The next evolution of Kubernetes is not about scale alone, but about intelligence, autonomy, and governance.

As part of the article ‘AI and Enterprise Technology Predictions from Industry Experts for 2026′, published by Solutions Review, Ratan Tipirneni, CEO of Tigera, shares his perspective on how AI and cloud-native technologies are shaping the future of Kubernetes.

His predictions describe how production clusters are evolving as AI becomes a core part of enterprise platforms, introducing new requirements for security, networking, and operational control.

Looking toward 2026, Tipirneni expects Kubernetes to move beyond its traditional role of running microservices and stateless applications. Clusters will increasingly support AI-driven components that operate with greater autonomy and interact directly with other services and systems. This shift places new demands on platform teams around workload identity, access control, traffic management, and policy enforcement. It also drives changes in how APIs are governed and how network infrastructure is designed inside the cluster.

Read on to explore Tipirneni’s predictions and what they mean for teams preparing Kubernetes platforms for an AI-driven future.

AI Agents Become First-Class Workloads

By 2026, Tipirneni predicts that Kubernetes environments will increasingly host agent-based workloads rather than only traditional cloud native applications. Continue reading

UET Congestion Management: Introduction

 Introduction


Figure 6-1 depicts a simple scale-out backend network for an AI data center. The topology follows a modular design, allowing the network to scale out or scale in as needed. The smallest building block in this example is a segment, which consists of two nodes, two rail switches, and one spine switch. Each node in the segment is equipped with a dual-port UET NIC and two GPUs.

Within a segment, GPUs are connected to the leaf switches using a rail-based topology. For example, in Segment 1A, the communication path between GPU 0 on Node A1 and GPU 0 on Node A2 uses Rail A0 (Leaf 1A-1). Similarly, GPU 1 on both nodes is connected to Rail A1 (Leaf 1A-2). In this example, we assume that intra-node GPU collective communication takes place over an internal, high-bandwidth scale-up network (such as NVLink). As a result, intra-segment GPU traffic never reaches the spine layer. Communication between segments is carried over the spine layer.

The example network is a best-effort (that is, PFC is not enabled) two-tier, three-stage non-blocking fat-tree topology, where each leaf and spine switch has four 100-Gbps links. Leaf switches have two host-facing links and two inter-switch links, while spine Continue reading

Great Wall Motor (GWM): Raksasa Otomotif Cina yang Mendunia

Industri otomotif Cina terus berkembang pesat. Salah satu pemain utamanya adalah Great Wall Motor atau GWM. Perusahaan ini dikenal sebagai produsen mobil andal. Mereka fokus pada SUV dan pikap. Kini, GWM sedang melangkah lebih jauh. Mereka menjadi kekuatan global yang tidak bisa dianggap remeh. Mari kita gali lebih dalam perjalanan mereka.

Perjalanan GWM Menjadi Pemimpin Pasar

GWM didirikan pada tahun 1984. Awalnya, mereka hanya memproduksi truk pikap. Produk mereka sangat populer di pasar domestik. Kemudian, mereka melihat peluang besar di pasar SUV. Mereka pun meluncurkan merek Haval. Strategi ini ternyata sangat sukses. Haval dengan cepat menjadi merek SUV terlaris di Cina. Oleh karena itu, GWM kokoh sebagai pemimpin di pasar lokal. Mereka membangun fondasi yang sangat kuat sebelum melangkah ke kancah internasional.

Strategi Cerdas dengan Multi-Brand

Untuk menjangkau pasar yang lebih luas, GWM mengadopsi strategi multi-brand. Setiap merek memiliki target pasar yang berbeda. Ini memungkinkan mereka bersaing di berbagai segmen. Strategi ini membuat portofolio GWM sangat lengkap. Berikut adalah rincian merek-merek utama mereka.

Merek
Fokus Utama
Haval SUV Mainstream
WEY SUV Premium
Ora Mobil Listrik Kompak
Tank Continue reading

What Is BGP Confederation?

What Is BGP Confederation?

By design, iBGP requires a full mesh of peerings between all routers so every router can learn routes from all other routers without loops. Prefixes learned from an iBGP peer are not advertised to another iBGP peer. This rule exists to prevent routing loops inside the autonomous system, and it is also the main reason why a full mesh is required. As the number of routers grows, maintaining this full mesh becomes complex and resource-heavy.

BGP confederations are one way to solve the scaling problems created by the BGP full mesh requirement. Another common approach is using Route Reflectors. BGP confederations break up a large autonomous system into smaller subautonomous systems (sub-ASs), reducing the number of iBGP peerings required.

BGP Route Reflectors, Originator ID and Cluster ID
This is where route reflectors come in. A route reflector reduces the need for full mesh by allowing certain routers to reflect routes to others.
What Is BGP Confederation?

Routers within the same sub-AS still need a full iBGP mesh, but the number of peerings is much smaller now. Connections to other confederations are made with standard eBGP, and peers outside the sub-AS are treated as external.

The confederation AS appears whole to other Continue reading