Ivan Pepelnjak, Author at NetworkingNexus.net

Ivan Pepelnjak

Author Archives: Ivan Pepelnjak

SwiNOG 40: Trustworthy Network Automation

The SwiNOG 40 event started with an interesting presentation on Building Trustworthy Network Automation (video) by Damien Garros (now CEO @ OpsMill) who discussed the principles one can use to build a trustworthy network automation solution, including idempotency, dry runs, and transactional changes. He also covered the crucial roles of the declarative approach, version control, and testing.

If you have ever watched any of my network automation materials, you won’t be surprised by anything he said, but if you’re just starting your network automation journey, you MUST watch this presentation to get your bearings straight.

Fun Reading: AI: Great Expectations

Rodney Brooks republished an article on great AI expectations that he wrote 37 years ago. Not surprisingly, apart from a few technical details triggered by four decades of exponential growth in silicon capabilities, the article could have been written yesterday.

Side note: I’m a bit younger than Rodney, but I also went through at least three waves of AI hype cycles, starting with Prolog and 4GL, then expert systems, and finally neural networks. Around that time, I stopped caring and focused on networking, but I have enough battle scars to remain skeptical.

BGP Community Propagation on Cisco IOS/XE: The 90’s Called

Just when I thought no vendor ~~stupidity~~ peculiarity could surprise me, Cisco IOS/XE proved me wrong.

I was improving a completely unrelated BGP functionality. I ran BGP integration tests on Cisco IOL (because it’s the fastest one to boot), and the BGP community propagation test failed. After verifying that I did not change the template and that the data structures had not changed, I checked the IOL release I was using.

Surprise 🎉🎉: the neighbor send-community configurations that worked since (at least) the IOS Classic release 15.x stopped working in Cisco IOS/XE release 17.16.01a.

MUST READ: Storage Devices and Latency

PlanetScale published a great article describing the high-level principles of how storage devices work and covering everything from tape drives to SSDs and network-attached storage — a must-read for anyone even remotely interested in how their data is stored.

Fun Reading: Who is LLM?

Is an LLM a stubborn donkey, a genie, or a slot machine (and why)? Find out in the Who is LLM? article by Martin Fowler.

ArubaCX: When BGP Soft Reconfiguration Becomes a No-Op

Changing an existing BGP routing policy is always tricky on platforms that apply line-by-line changes to device configurations (Cisco IOS and most other platforms claiming to have industry-standard CLI, with the notable exception of Arista EOS). The safest approach seems to be:

Do not panic when the user makes changes to route maps and underlying filters (prefix lists, AS-path access lists, or community lists).
Let the user decide when they’re done and process the BGP table with the new routing policy at that time.

Ultra Ethernet: Reinventing X.25

One should never trust the technical details published by the industry press, but assuming the Tomahawk Ultra puff piece isn’t too far off the mark, the new Broadcom ASIC (supposedly loosely based on emerging Ultra Ethernet specs):

Uses Optimized Ethernet Header, replacing IP/UDP header with a 10-byte something (let’s call it session identifier)
Makes Ethernet lossless with hop-by-hop retransmission/error recovery
Uses credit-based flow control (the receiver continuously updates the sender about the amount of available space)

If you’re ancient enough, you might recognize #3 as part of Fibre Channel, #2 and #3 as part of IEEE 802.1 LLC2 (used by IBM to implement SNA over Token Ring and Ethernet), and all three as the fundamental ideas of X.25 that Broadcom obviously reinvented at 800 Gbps speeds, proving (yet again) RFC 1925 Rule 11.

Always Check Your Tests Against Faulty Inputs

A while ago, I published a blog post proudly describing the netlab integration test that should check for incorrect OSPF network types in netlab-generated device configurations. Almost immediately, Erik Auerswald pointed out that my test wouldn’t detect that error (it might detect other errors, though) as the OSPF network adjacency is always established even when the adjacent routers have mismatching OSPF network types.

I made one of the oldest testing mistakes: I checked whether my test would work under the correct conditions but not whether it would detect an incorrect condition.

Cisco IOS/XE Hates Redistributed Static IPv6 Routes

Writing tests that check the correctness of network device configurations is hard (overview, more details). It’s also an interesting exercise in getting the timing just right:

Routing protocols are an eventually-consistent distributed system, and things eventually appear in the right place (if you got the configurations right), but you never know when exactly that will happen.
You can therefore set some reasonable upper bounds on when things should happen, and declare failure if the timeouts are exceeded. Even then, you’ll get false positives (as in: the test is telling you the configurations are incorrect, when it’s just a device having a bad hair day).

And just when you think you nailed it, you encounter a device that blows your assumptions out of the water.

netlab 25.07: Summaries and Confederations

netlab release 25.07 was published yesterday. The major new features include:

The ospf.areas plugin supports OSPFv2 and OSPFv3 stub areas, NSSA areas, and area ranges.
The BGP routing policies plugin supports aggregate BGP routes
The BGP configuration module supports BGP confederations

But wait, there’s much more:

Dual-Stack Common-Services VRF Confuses Aruba CX

As I was running the netlab pre-release integration tests, I noticed that ArubaCX failed the IPv6 Common Services test (it worked before). Here’s the gist of what that test does:

It creates three VRFs (red, blue, and common)
It imports routes from red and blue VRF into the common VRF and routes from the common VRF into the red and blue VRF (the schoolbook example of common services VRF)
Just to be on the safe side, it imports red routes into the red VRF and so on.

Here’s the relevant part of the netlab lab topology:

Worth Reading: The Secret Rules of the Terminal

Did you ever wonder why pressing an up-arrow in a (Linux) terminal window sometimes recalls the previous command but other times creates ^[[A?

Julia Evans did, and spent months exploring the quirks of the Linux terminal (and writing blog posts describing what she found), finally resulting in The Secret Rules of the Terminal (including the various shells, terminal emulators, escape codes, and TTY driver). A must-read if you’re a newbie who wants to understand why things happen the way they do.

Expanding a Running Netlab Topology

One of the happy netlab users sent me an interesting challenge:

He’s built a large lab and added tons of extra configuration to the lab devices.
Afterwards, he realized he’d like to add a few more devices to the lab and was worried about losing all the changes he had made.

Unfortunately, you cannot add new devices to an already-running lab. You must shut down the lab, change the topology description, and start a new lab. However, there are things you can do to preserve the extra work you already did:

Worth Reading: Expert Generalists

Martin Fowler published an interesting article about Expert Generalists. Straight from the abstract:

As computer systems get more sophisticated we’ve seen a growing trend to value deep specialists. But we’ve found that our most effective colleagues have a skill in spanning many specialties.

Also:

There are two sides to real expertise. The first is the familiar depth: a detailed command of one domain’s inner workings. The second, crucial in our fast-moving field is the ability to learn quickly, spot the fundamentals that run beneath shifting tools and trends, and apply them wherever we land.

Remember how I told you to focus on the fundamentals? 😎

Keep reading

Molly-Guard: a Lifesaver on a Ubuntu Server

Have you ever managed to type reload in the wrong terminal window and brought down a core switch (I probably did)? I managed to do the Ubuntu equivalent of that stupidity: I told my main Ubuntu server to sudo poweroff instead of doing that to a Vagrant VM.

Fortunately, the open-source world doesn’t have to rely on the roadmaps created by networking vendors’ product managers; if there’s a big enough pain, someone will solve it.

IS-IS 3-Way Handshake and the Power of SHOULD

Yesterday, I mentioned that a Cisco router running pre-standard IS-IS 3-way handshake (this is why you need it) interoperates with multiple implementations of RFC 5303. How’s that possible, and does it matter whether you configure the ancient Cisco routers (release 15.x) to use IETF 3-way handshake instead of the “proprietary” one?

TL&DR: It SHOULD NOT matter, but the more I explore the RFCs, the more I’m amazed anything works at all.

I took a trip to the Wireshark land to figure out the details (you can download the capture file):

Start netlab Tools without Changing Topology File

Dan Partelly figured out that we have to configure the standard (IETF) 3-way IS-IS handshake on old IOSv images. On the other hand, all IS-IS integration tests pass for IOSv and IOSvL2. I wondered what was going on.

Fortunately, a few months ago, I spent some time installing the client-side Edgeshark components on my laptop. All I needed to do was enable the edgeshark tool in my lab topology and restart the lab.

SwiNOG 40: A Day of Awesomeness

A few days ago, I attended a SwiNOG meeting for the first time and realized what a mistake I was making — I should have been there years ago.

Not only was the event impeccably organized (what else would you expect in Switzerland) and at the best event location I have ever experienced (it’s hard to beat this view), it was also full of short, interesting, up-to-the-point presentations (you can already view the slide decks, YouTube videos should be available shortly). Plus, I met so many old friends I haven’t seen in years, and people I communicated with for years but never met before.

It’s not like the organizers would need any more publicity (the event was sold out), but if you happen to be near Switzerland in time for the next meeting, make sure to be there.

Thanks again to the wonderful SwiNOG core team for a fantastic experience! I hope we’ll meet again at the next SwiNOG meeting!

Testing OSPF Device Configurations

A year ago, I described how we use the netlab validate command to test device configuration templates for most platforms supported by netlab. That blog post included a simple “this is how you test interface address configuration” example; now, let’s move to something a bit more complex: baseline OSPF configuration.

Testing the correctness of OSPF configurations seems easy:

Build a lab with a test device and a few other OSPF devices
Configure the devices
Log into the test device and inspect OSPF operational data

There’s just a tiny little fly in this ointment…

Quality of OSPFv2 NSSA Implementations

A few weeks ago, we added OSPF areas functionality to netlab. In the next release¹, you’ll be able to configure stub areas, NSSA areas, inter-area route summarization and filtering (OSPF ranges), and summarization of NSSA type-7 prefixes for OSPFv2 and OSPFv3.

OSPFv2 (defined in RFC 2328) is 27 years old, and NSSA functionality (RFC 3101) was last touched 22 years ago. One would hope the implementations in network devices are mature and feature-complete. Yeah, keep dreaming 🤦‍♂️.

« Previous 1 2 3 4 5 … 185 Next »