How Moving Away from Ansible Made netlab Faster

TL&DR: Of course, the title is clickbait. While the differences are amazing, you won’t notice them in small topologies or when using bloatware that takes minutes to boot.

Let’s start with the background story: due to the (now fixed) suboptimal behavior of bleeding-edge Ansible releases, I decided to generate the device configuration files within netlab (previously, netlab prepared the device data, and the configuration files were rendered in an Ansible playbook).

As we use bash scripts to configure Linux containers, it makes little sense (once the bash scripts are created) to use an Ansible playbook to execute docker exec script or ip netns container exec script. netlab release 26.01 runs the bash scripts to configure Linux, Bird, and dnsmasq containers directly within the netlab initial process.

Now for the juicy part.

RAID 5 with mixed-capacity disks on Linux

Standard RAID solutions waste space when disks have different sizes. Linux software RAID with LVM uses the full capacity of each disk and lets you grow storage by replacing one or two disks at a time.

We start with four disks of equal size:

$ lsblk -Mo NAME,TYPE,SIZE
NAME TYPE  SIZE
vda  disk  101M
vdb  disk  101M
vdc  disk  101M
vdd  disk  101M

We create one partition on each of them:

$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vda
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdb
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdc
$ sgdisk --zap-all --new=0:0:0 -t 0:fd00 /dev/vdd
$ lsblk -Mo NAME,TYPE,SIZE
NAME   TYPE  SIZE
vda    disk  101M
└─vda1 part  100M
vdb    disk  101M
└─vdb1 part  100M
vdc    disk  101M
└─vdc1 part  100M
vdd    disk  101M
└─vdd1 part  100M

We set up a RAID 5 device by assembling the four partitions:1

$ mdadm --create /dev/md0 --level=raid5 --bitmap=internal --raid-devices=4 \
>   /dev/vda1 /dev/vdb1 /dev/vdc1 /dev/vdd1
$ lsblk -Mo NAME,TYPE,SIZE
    NAME          TYPE    SIZE
    vda           disk    101M
┌┈▶ └─vda1        part    100M
┆   vdb           disk    101M
├┈▶ └─vdb1        part    100M
Continue reading

TNO053: Ethernet Is Everywhere

Ethernet is everywhere. Today we talk with one of the people responsible for this protocol’s ubiquity. Doug Boom is a veteran of the Ethernet development world. His code has helped landers reach Mars, submarines sail the deep seas, airplanes get to their gates, cars drive around town, and more. Doug walks us through the origins... Read more »

Astro is joining Cloudflare

The Astro Technology Company, creators of the Astro web framework, is joining Cloudflare.

Astro is the web framework for building fast, content-driven websites. Over the past few years, we’ve seen an incredibly diverse range of developers and companies use Astro to build for the web. This ranges from established brands like Porsche and IKEA, to fast-growing AI companies like Opencode and OpenAI. Platforms that are built on Cloudflare, like Webflow Cloud and Wix Vibe, have chosen Astro to power the websites their customers build and deploy to their own platforms. At Cloudflare, we use Astro, too — for our developer docs, website, landing pages, and more. Astro is used almost everywhere there is content on the Internet.

By joining forces with the Astro team, we are doubling down on making Astro the best framework for content-driven websites for many years to come. The best version of Astro — Astro 6 —  is just around the corner, bringing a redesigned development server powered by Vite. The first public beta release of Astro 6 is now available, with GA coming in the weeks ahead.

We are excited to share this news and even more thrilled for what Continue reading

Kubernetes Networking at Scale: From Tool Sprawl to a Unified Solution

As Kubernetes platforms scale, one part of the system consistently resists standardization and predictability: networking. While compute and storage have largely matured into predictable, operationally stable subsystems, networking remains a primary source of complexity and operational risk

This complexity is not the result of missing features or immature technology. Instead, it stems from how Kubernetes networking capabilities have evolved as a collection of independently delivered components rather than as a cohesive system. As organizations continue to scale Kubernetes across hybrid and multi-environment deployments, this fragmentation increasingly limits agility, reliability, and security.

This post explores how Kubernetes networking arrived at this point, why hybrid environments amplify its operational challenges, and why the industry is moving toward more integrated solutions that bring connectivity, security, and observability into a single operational experience.

Ready to Scale Kubernetes Without the Stress?
Scaling Kubernetes Networking Without the Complexity

The Components of Kubernetes Networking

Kubernetes networking was designed to be flexible and extensible. Rather than prescribing a single implementation, Kubernetes defined a set of primitives and left key responsibilities such as pod connectivity, IP allocation, and policy enforcement to the ecosystem. Over time, these responsibilities were addressed by a growing set of specialized components, each focused on a narrow slice of Continue reading

Human Native is joining Cloudflare

Today, we’re excited to share that Cloudflare has acquired Human Native, a UK-based AI data marketplace specializing in transforming multimedia content into searchable and useful data.

Human Native x Cloudflare

The Human Native team has spent the past few years focused on helping AI developers create better AI through licensed data. Their technology helps publishers and developers turn messy, unstructured content into something that can be understood, licensed and ultimately valued. They have approached data not as something to be scraped, but as an asset class that deserves structure, transparency and respect.

Access to high-quality data can lead to better technical performance. One of Human Native’s customers, a prominent UK video AI company, threw away their existing training data after achieving superior results with data sourced through Human Native. Going forward they are only training on fully licensed, reputably sourced, high-quality content.

This gives a preview of what the economic model of the Internet can be in the age of generative AI: better AI built on better data, with fair control, compensation and credit for creators.

The Internet needs new economic models

For the last 30 years, the open Internet has been based on a fundamental value exchange: creators Continue reading

Using netlab to Set Up Demos

David Gee was time-pressed to set up a demo network to showcase his network automation solution and found that a Ubuntu VM running netlab to orchestrate Arista cEOS containers on his Apple Silicon laptop was exactly what he needed.

I fixed a few blog posts based on his feedback (I can’t tell you how much I appreciate receiving a detailed “you should fix this stuff” message, and how rare it is, so thanks a million!), and David was kind enough to add a delightful cherry on top of that cake with this wonderful blurb:

Netlab has been a lifesaver. Ivan’s entire approach, from the software to collecting instructions and providing a meaningful information trail, enabled me to go from zero to having a functional lab in minutes. It has been an absolute lifesaver.

I can be lazy with the infrastructure side, because he’s done all of the hard work. Now I get to concentrate on the value-added functionality of my own systems and test with the full power of an automated and modern network lab. Game-changing.

NAN110: Bridging Industry, Academia, and the Future of Network Automation with Dr. Levi Perigo

Eric Chou is joined by Dr. Levi Perigo, Scholar in Residence and Professor of Network Engineering at the University of Colorado, Boulder. They discuss Levi’s non-traditional career path from being in the network automation industry for 20 years before shifting to academia and co-founding QuivAR. Levi also dives into the success of the CU Boulder... Read more »

TCG066: How Infrastructure Teams Can Scale Reasoning Without Losing Control with Chris Wade

The industry has pivoted from scripting to automation to orchestration – and now to systems that can reason. Today we explore what AI agents mean for infrastructure with Chris Wade, Co-Founder and CTO of Itential. We also dive into the brownfield reality, the potential for vendor-specific LLMs trained on proprietary knowledge, and advice for the... Read more »

Do You Need IS-IS Areas?

TL&DR: Most probably not, but if you do, you’d better not rely on random blogs for professional advice #justSaying 😜

Here’s an interesting question I got from a reader in the midst of an OSPF-to-IS-IS migration:

Why should one bother with different [IS-IS] areas when the routing hierarchy is induced by the two levels and the appropriate IS-IS circuit types on the links between the routers?

Well, if you think you need a routing hierarchy, you’re bound to use IS-IS areas because that’s how the routing hierarchy is implemented in IS-IS. However…

What came first: the CNAME or the A record?

On January 8, 2026, a routine update to 1.1.1.1 aimed at reducing memory usage accidentally triggered a wave of DNS resolution failures for users across the Internet. The root cause wasn't an attack or an outage, but a subtle shift in the order of records within our DNS responses.

While most modern software treats the order of records in DNS responses as irrelevant, we discovered that some implementations expect CNAME records to appear before everything else. When that order changed, resolution started failing. This post explores the code change that caused the shift, why it broke specific DNS clients, and the 40-year-old protocol ambiguity that makes the "correct" order of a DNS response difficult to define.

Timeline

All timestamps referenced are in Coordinated Universal Time (UTC).

Time

Description

2025-12-02

The record reordering is introduced to the 1.1.1.1 codebase

2025-12-10

The change is released to our testing environment

2026-01-07 23:48

A global release containing the change starts

2026-01-08 17:40

The release reaches 90% of servers

2026-01-08 18:19

Incident is declared

2026-01-08 18:27

The release is reverted

2026-01-08 19:55

Revert is completed. Impact ends

What happened?

While making some improvements to lower the memory usage of Continue reading

1 3 4 5 6 7 3,841