Ivan Pepelnjak

Author Archives: Ivan Pepelnjak

MLAG Deep Dive: Dealing with LAG Member Failures

Craig Weinhold pointed me to a complex topic I managed to ignore in my MLAG Deep Dive series: how does an MLAG cluster reroute around a failure of a LAG member link?

In this blog post, we’ll focus on traditional MLAG cluster implementations using a peer link; another blog post will explore the implications of using VXLAN and EVPN to implement MLAG clusters.

We’ll also ignore the interesting question of “how is the LAG member link failure detected?1 and focus on “what happens next?” using the sample MLAG topology:

Worth Exploring: LibreQoS

Erik Auerswald pointed me to an interesting open-source project. LibreQoS implements decent QoS using software switching on many-core x86 platforms. It’s implemented as a bump-in-the-wire software solution, so you should be able to plug it into your network just before a major congestion point and let it handle the packet dropping and prioritization.

Obviously, the concept is nothing new. I wrote about a similar problem in xDSL networks in 2009.

Repost: State of Lisp Implementations (2024)

You might remember Béla Várkonyi’s use of LISP to build resilient ground-to-airplane networks from last week’s repost. It seems he’s not exactly happy with the current level of LISP support, at least based on what he wrote as a response to Jeff McLaughlin’s claim that “I can tell you that our support for EVPN does not, in any way, indicate the retirement of LISP for SD-Access.”:


Nice to hear the Cisco intends to support LISP. However, it is removed from IOS XR already. So it is not that clear…

If Cisco will stop supporting LISP, then we will be forced to create our own LISP routers, since we need it for extreme mobility environments.

Famous Last Words: I’m Too Stupid for That

Some networking vendors realized that one way to gain mindshare is to make their network operating systems available as free-to-download containers or virtual machines. That’s the right way to go; I love their efforts and point out who went down that path whenever possible1 (as well as others like Cisco who try to make our lives miserable).

However, those virtual machines better work out of the box, or you’ll get frustrated engineers who will give up and never touch your warez again, or as someone said in a LinkedIn comment to my blog post describing how Junos vPTX consistently rejects its DHCP-assigned IP address: “If I had encountered an issue like this before seeing Ivan’s post, I would have definitely concluded that I am doing it wrong.2

Repost: The Real LISP Mobility Use Case

Béla Várkonyi is working on an interesting challenge: building ground-to-airplane(s) networks providing multilink mobility. Due to its relative simplicity, he claims LISP works much better than BGP in that environment.


In some newer routers BGP would not be such a big bottleneck, but you need a lot of knob turning in BGP to get it right, while in LISP it is quite simple.

If you have many thousands concurrent airplanes with multi-link and max. 16 subnets with different routing policies on each, and the radio links are going up and down, then you have a large number of mobility events.

netlab: Global and Node VRFs

When designing the netlab VRF configuration module, I tried to make it as flexible as possible while using the minimum number of awkward nerd knobs. As is often the case1, the results could be hard to grasp, so let’s walk through the various scenarios of using global and node VRFs.

netlab allows you to define a VRF in the lab topology vrfs dictionary (global VRF) or in a node vrfs dictionary (node VRF). In most cases, you’d define a few global VRFs and move on.

Repost: Think About the 99% of the Users

Daniel left a very relevant comment on my Data Center Fabric Designs: Size Matters blog post, describing how everyone rushes to sell the newest gizmos and technologies to the unsuspecting (and sometimes too-awed) users1:


Absolutely right. I’m working at an MSP, and we do a lot of project work for enterprises with between 500 and 2000 people. That means the IT department is not that big; it’s usually just a cost center for them.

Stop the Network-Based Application Recognition Nonsense

One of my readers sent me an interesting update on the post-QUIC round of NBAR whack-a-mole (TL&DR: everything is better with Bluetooth AI):

Cloudflare (and the other hyperscalers) are full into QUIC, as it gives them lots of E2E control, taking a lot of choice away from the service providers on how they handle traffic and congestion. It is quite well outlined by Geoff Huston in an APNIC podcast.

So far, so good. However, whenever there’s a change, there’s an opportunity for marketing FUD, coming from the usual direction.

Repost: Campus-Wide Wireless Roaming with EVPN

As a response to my LISP vs EVPN: Mobility in Campus Networks blog post, Route Abel provided interesting real-life details of a large-scale campus wireless testing using EVPN and VXLAN tunnels to a central aggregation point (slightly edited):


I was arguing for VxLAN EVPN with some of my peers, but I had no direct hands-on knowledge of how it would actually perform and very limited ability to lab it on hardware. My client was considering deploying Campus VxLAN, and they have one of the largest campuses in North America.

FRRouting Loopback Interfaces and OSPF Costs

TL&DR: FRRouting advertises the IP prefix on the lo loopback interface with zero cost.

Let’s start with the background story. When we added FRRouting containers support to netlab, someone decided to use lo0 as the loopback interface name. That device doesn’t exist in a typical Linux container, but it’s not hard to add it:

$ ip link add lo0 type dummy
$ ip link set dev lo0 up

Unintended Consequences of IPv6 SLAAC

One of my friends is running a large IPv6 network and has already experienced a shortage of IPv6 neighbor cache on some of his switches. Digging deeper into the root causes, he discovered:

In my larger environments, I see significant neighbor table cache entries, especially on network segments with hosts that make many long-term connections. These hosts have 10 to 20 addresses that maintain state over days or weeks to accomplish their processes.

What’s going on? A perfect storm of numerous unrelated annoyances:

Explore: Why No IPv6? (IPv6 SaaS)

Lasse Haugen had enough of the never-ending “we can’t possibly deploy IPv6” excuses and decided to start the IPv6 Shame-as-a-Service website, documenting top websites that still don’t offer IPv6 connectivity.

His list includes well-known entries like twitter.com, azure.com, and github.com plus a few unexpected ones. I find cloudflare.net not having an AAAA DNS record truly hilarious. Someone within the company that flawlessly provided my website with IPv6 connectivity for years obviously still has some reservations about their own dogfood ;)

1 4 5 6 7 8 128