Jeroen Van Bemmel Archives - NetworkingNexus.net

Troubleshooting VXLAN MTU issues with SR Linux

Giving your network engineers a fighting chance with the industry’s most truly open NOS

VXLAN over UDP adds 50 bytes to every packet

A recent blog reminds us that VXLAN overlay networks can be tricky to troubleshoot, as a typical encapsulation header adds 50 bytes per packet and RFC7348 simply assumes that the network is configured to support such larger payloads. And unlike most other IP packets, typical VTEPs will simply discard frames that are too large — without informing the sender.

Tailored troubleshooting commands in Linux

A common VXLAN underlay network is an IP fabric with IPv4 VTEPs at the leaves, using multiple spines with ECMP for redundancy and capacity. In Netlab such a topology might look like this:

netlab up -d srlinux -p clab vxlan-bridging-leaf-spine.yml

To troubleshoot suspected MTU issues in this context — say when a ping from h1 to h2 fails — one would verify the MTU on all paths between the VXLAN VTEPs l1/l2, i.e. l1->s1->l2 and l1->s2->l2. To simplify this task on SR Linux, we can add a custom /tools command for traceroute:

/tools vxlan-traceroute mac-vrf vlan1000

/tools vxlan-traceroute output confirming MTU 1550

As can be seen from the above screenshot, this custom vxlan-traceroute command takes a Continue reading

Supporting next level IXP topologies

Netlab 1.4 sneak preview (unofficial)

Output of ‘fdp’ layout for ‘netlab create -o graph’

Imagine you are an IXP deploying technologies like RFC9161 EVPN with proxy ARP and MPLS over RSVP-TE, and you need to come up with a validated multi-vendor design. How would you go about that?

The Netlab team has got you covered. Check out this example — a sneak preview with upcoming Netlab 1.4 features (work in progress)

S1/S2 are data center routers doing EVPN/VXLAN with proxy ARP; iBGP control plane and ISIS IGP. Anycast gateways are available in the ‘red-hot’ vlan
C1/C2 are core nodes doing SRv6 over ISIS (one might call this BGP-Free :)
PE1/PE2 are MPLS core nodes doing MPLS EVPN over LDP, with OSPF

netlab up

…is all it takes to bring this topology to life!

Customer h1 can ping h4 across this ultimate feature fabric

Resulting interface configuration on s1 (SR Linux)

Sample ISIS/SRv6 configuration on c1 (SR OS)

Revisiting BGP EVPN VXLAN to the hosts with SR Linux 22.6.3

Featuring Netlab 1.3.1 and FRR 8.3.1

Containerlab topology for EVPN-VXLAN-to-the-host

At the beginning of this year I wrote about my SR Linux BGP EVPN adventures, with considerations around underlay and overlay design and the illustrious iBGP-over-eBGP approach. Some readers may have noticed a resemblance to the constellation of Ursa Major — the Big Bear: A reference to our friends at CloudBear, a recent SR Linux customer.

Fast forward to September 2022 and we now have SR Linux 22.6.3 with some features I have been waiting for, like (e)BGP Unnumbered. From my side, I have been working with the open source community to extend support for tools like Netlab (formerly Netsim-tools), Containerlab and FRR to enable sophisticated and advanced network topologies using truly open source tools and components.

New features and changes

The issue of running BGP to Linux hosts using FRR popped up in several discussions. Though technically possible, it can be challenging to configure, and there are many design variations with implications that aren’t always obvious. To enable simple experimentation and quick design iterations, I decided to help out by extending Netlab with VLAN, VXLAN and VRF support for FRR. I also made some changes to Continue reading

On findingballoons in data center networks micro-detection using adaptive ➰ feedback loops ♻

Using #SRLinux for flexible decentralized DDoS attack detection #SecDevOps

A distributed problem (“attack”) may require a distributed solution (source)

Distributed Denial of Service (DDoS) attacks continue to be a major problem for many network operators and their customers. As in most networking problems, the key issue is scale: Attackers are able to mount an amplified attack using many (N) sources to send large (M) payloads to a single (1) target server, causing link and CPU saturation and system overload

N*M >> 1 ~> overloaded system(s) and unhappy customers

Much like security in general, solving DDoS attacks is a continuous process, not a one-time product or solution deployment. While most operators have deployed DDoS mitigation solutions, there will — unfortunately — always come a time where the current solution falls short, and something else or more is needed.

#DevSecOps: Shortening feedback loops

Feedback loops (credit: Peter Phaal / Tim Cochran)

Back in 2011 Peter Phaal wrote a blog about “Delay and stability”. Even though more than a decade has passed since, one can easily see how triggers like AWS outages haven’t changed — these points remain relevant and valid today:
✅ Measurement(observability) plays a critical role in data centers; it is the
foundation for automation (more on this Continue reading

#NANOG84 Hackathon: No plan survives first contact with go-getter students ‍‍‍

Austin, 12–13 February 2022

So there I was, ready to do battle in my blue corner of the ring. Together with Anton Karneliuk from Karneliuk.com — author of pyGNMI among other things — we got started on a multi-vendor NAPALM driver using gNMI as a backend.

Working around the clock in our different time zones, in full blown follow-the-moon style, we managed to get a basic NAPALM get_facts working for both Arista cEOS and Nokia SR Linux, at some point late Saturday night.

NAPALM get_facts for cEOS, using “vendor” gnmi

Working together to build the Internet of tomorrow®

On Sunday a group of students from University of Colorado Boulder joined the party, and it quickly became apparent that our hacking plans would have to change. Because important as multi-vendor compatibility issues are, inter-generation transfer of knowledge is even more critical. We already barely understand the networks we have built today — how could we ever expect the next generation to keep things running if we don’t help them understand what we did?

Long story short: I teamed up with Dinesh Kumar Palanivelu and he ended up submitting his first Pull Request: A small step for a man, but a huge win for the NANOG community!

Thanks again Continue reading

All is fair in and #NANOG Hackathons — refurbishing NAPALM drivers to build a multi-vendor…

All is fair in 💔 and #NANOG Hackathons — refurbishing NAPALM 🔥 drivers to build a multi-vendor #gNMI plug-in 🔌

Multi-vendor NAPALM driver based on gNMI

We’re in Austin, Texas this week where the 84th North American Network Operators Group (NANOG) convention is taking place. Preceding that, during the Super Bowl ⅬⅤⅠ weekend (in which another blue team is about to win big), there is a Hackathon in which the teams are challenged to prepare for the networking equivalent of an epic halftime show.

Yours truly figures it would be a good idea to use this opportunity to kick-off the creation of a multi-vendor NAPALM driver. Most (if not all) NAPALM drivers to date are single vendor, see for example the Nokia SR OS NAPALM driver and the SR Linux variant. However, there is significant overlap in functionality and logic, and so I’d like to see if there is a possibility to “share the burden” by collaborating on some of the more basic (and — quite frankly — boring) parts of the drivers.

My (rough) plan is to clone the best current NAPALM driver code base — eos has been suggested — remove whatever logic it uses to talk to its vendor specific device APIs, and replace that with pyGNMI. To demonstrate this Continue reading

Making the network (even) more consumable: Salty minions for cloudlubbers ☁️

Building and integrating custom SR Linux agents — part Ⅱ

A ‘lubber’ is a “big, clumsy person”, a word more commonly used in the form of ‘landlubber’ meaning someone unfamiliar with the sea or sailing. The latter was first recorded back in 1690, combining the Middle English word ‘lubber’ (~1400) with the Germanic — or Dutch — ‘land’: “A definite portion of the earth’s surface owned by an individual or home of a nation”.

The process of combining two words to form a new one is called ‘nominal composition’ and it is used a lot in my native Dutch language. Technically this means that Dutch has exponentially more words than English for example, though we can’t be bothered to list them in dictionaries — we simply invent them as needed (against the advice of my browser’s spellchecker who argues “cloudlubber” should be written as two separate words, silly machine)

Found at the bottom of that dictionary list: Toki Pona — “simple language” — is a minimalistic constructed language consisting of only 120 words. It uses only 14 letters — a e i j k l m n o p s t u w — to form words

Insulting one’s audience from the start is generally considered bad practice, but we’re ok because I wasn’t talking Continue reading

To eBGP or not to iBGP — that is the query (please with me ;)

To eBGP or not to iBGP — that is the query (please 🐻 with me ;)

BGP EVPN with VXLAN to the multi-tenant hosts (using SR Linux and SR OS)

There are many ways to do things, and some ways are subjectively better than others. Sometimes, things that may have been a Bad Idea™ in the past become a Not_So_Bad_After_All(maybe, provided X and not Y) option in light of new developments or concerns. To tell which one is which, we’ll just have to give it a go and see where we end up. And whatever happens — 🤞 chances are we’ll learn some things in the process regardless!

The evolution of BGP-to-the-host (how the … did we end up here?)

From L2 (with STP), to L3 (with IGP), to BGP with SDN on the hosts

“Accelerating waves of change to open (2021)” paints a picture of how we can see things evolving over time. From hardware to software, from L2 to static L3 to dynamically signaled L3 with BGP to VXLAN (L2-over-L3) — it all follows a similar pattern of evolutionary change, building on technology from the past to create the future, following the path of the adjacent possible. The terminology and acronyms we humans Continue reading

Between 0x② nerd knobs — BGP unnumbered on SR Linux

Using a custom FRR agent 🕴️

Supporting BGP unnumbered on SR Linux using a custom embedded agent

As (also) explained in this vodcast by Jeff Doyle and Jeff Tantsura (April 2020), BGP remains a key protocol in networks of all sizes. As part of a global drive for simplification and automation, the engineers at Cumulus Networks have pioneered a feature called “BGP unnumbered” to simplify the configuration of large data center fabrics: RFC8950 (formerly RFC5549) describes how extended next-hop encoding can be negotiated and used to exchange IPv4 prefix routes using IPv6 next hops, such that the fabric interfaces can use auto-assigned IPv6 link-local addresses (only), with no IPv4 at all. In combination with AS number discovery, this greatly simplifies the configuration.

SR Linux inherits its BGP stack from Nokia SR OS, a robust mature hardened software product that runs the internet. It already supports the majority of features that one would expect in a data center context: Besides IPv4/v6 and EVPN address families, there is support for RFC8950 extended next-hop encoding, extensive BGP import/export policies, and much much more. However, in the case of large service provider networks and the internet at large, BGP is often and commonly used between Continue reading

The road to success is paved with…IPv4 over SRv6 over ISIS and BGP on SROS 21.10

🌁The road to success is paved with…IPv4 over SRv6 over ISIS and BGP on SR OS 21.10🌄

“One packet does not make SPRING” — yours truly

The IETF Source Packet Routing in Networking (SPRING) working group was created in 2013, shortly after my family and I emigrated from Europe to Canada. I can’t say I’ve ever been involved personally, I have heard about Segment Routing as a concept but hadn’t really done anything with it until now. Especially in the context of “SRv6” it regularly seems to get derided as a networking vendor ploy to sell more equipment — but while I can confirm nor deny the level of truth in that, I am technically intrigued.

For every problem there are countless solutions, and this is especially true in networking. Does the world really need Yet Another Method to send packets from A to B? The strive for simplicity would seem to imply a minimal amount of variations and protocols, but how can we know what “simple” looks like? This line of thinking inspired a challenge 🥶: Build the most complicated combination of protocols you can imagine, then take it from there.

SRv6 — “or some people would feel left out”

Never Continue reading

Catch 2022: Networking Ice Bucket Challenge

At the start of this exciting new year of 2022, I figured this might be a good time to introduce a new challenge:

Using Netsim-Tools, build the most complicated virtual network topology that still allows host A to ping host B

Anything goes — and if you have to extend the tooling to make things work, even better. Varying latency and occasional packet loss are acceptable, but there needs to be at least 1 ping reply being delivered to A.

For example, how about multi-vendor EVPN-VXLAN over SRv6 with MACsec encryption and Traffic Engineering?

Happy 2022 networking everyone! 🎆

Keeping up with the Pepelnjakᣵ — OSPF over unnumbered interfaces for SROS

🎥Keeping up with the Pepelnjakᣵ — OSPF over unnumbered interfaces for SROS

Another week, another Netsim-Tools release 😩 One would think that a “reduced scope of activities” and Irena leaving would slow things down, but…The “s” (exponential) in the title reflects my suspicion that Ivan has somehow managed to clone himself, or perhaps we are seeing the outputs of an as-yet unnamed team of collaborators publishing in his name. Either way, things are crazy hot and changing on a daily basis, the “bleeding edge” of network automation indeed.

OSPF over unnumbered interfaces

This morning Ivan published a blog post about OSPF over unnumbered interfaces. The topology for that article can be found here; observant readers will quickly notice a problem with it, but we can easily fix that:

netlab up -d sros -p clab

That is, bring up the same topology, but use Nokia SR-OS devices and Containerlab instead of the defaults.

Topology with removed unnumbered multi-access link

As Ivan explains in his article, OSPF doesn’t work over unnumbered multi-access links, and so Netsim-Tools complains and stops you from wasting more time.

After the nodes boot (and adding support for OVS bridges), we Continue reading