I wanted to cover fast failover (at least the basics and Prefix Independent Convergence – PIC) in another live session of How Networks Really Work webinar in 2021, but unfortunately I ran out of time.
As a teaser, you might want to watch the recording of Fast Failover: Marketing and Reality presentation I had at the Seventh RSNOG Conference.
I wanted to cover fast failover (at least the basics and Prefix Independent Convergence – PIC) in another live session of How Networks Really Work webinar in 2021, but unfortunately I ran out of time.
As a teaser, you might want to watch the recording of Fast Failover: Marketing and Reality presentation I had at the Seventh RSNOG Conference.
The Dynamic Negotiation of BGP Capabilities blog post generated almost no comments, apart from the #facepalm realization that a certain network operating system resets IBGP sessions when the sole EBGP session goes down, but there were a few interesting comments on LinkedIn and Twitter.
While most engineers easily relate to the awkwardness of bringing down a BGP session to enable new functionality (Tearing down BGP session, as a solution reminds me rebooting a host, as a solution.), it’s not as easy as it looks. As Adam Chappell put it “Dynamic capability renegotiation does tend to sound a bit like changing the tyres while still moving. Very neat if you can pull it off but so much to go wrong…”
The Dynamic Negotiation of BGP Capabilities blog post generated almost no comments, apart from the #facepalm realization that a certain network operating system resets IBGP sessions when the sole EBGP session goes down, but there were a few interesting comments on LinkedIn and Twitter.
While most engineers easily relate to the awkwardness of bringing down a BGP session to enable new functionality (Tearing down BGP session, as a solution reminds me rebooting a host, as a solution.), it’s not as easy as it looks. As Adam Chappell put it “Dynamic capability renegotiation does tend to sound a bit like changing the tyres while still moving. Very neat if you can pull it off but so much to go wrong…”
Here’s a fun fact network automation pundits don’t want to hear: if you’re working with replaceable device configurations (as we did for the past 20 years, at least those fortunate enough to buy Junos), you already meet the Infrastructure-as-Code requirements. Storing device configurations in a version control system and using reviews and merge requests to change them (aka GitOps) is just a cherry on the cake.
When I made a claim along these same lines a few weeks ago during the Network Automation Concepts webinar, Vladimir Troitskiy sent me an interesting question:
Here’s a fun fact network automation pundits don’t want to hear: if you’re working with replaceable device configurations (as we did for the past 20 years, at least those fortunate enough to buy Junos), you already meet the Infrastructure-as-Code requirements. Storing device configurations in a version control system and using reviews and merge requests to change them (aka GitOps) is just a cherry on the cake.
When I made a claim along these same lines a few weeks ago during the Network Automation Concepts webinar, Vladimir Troitskiy sent me an interesting question:
Setting up a network automation development environment is an interesting task:
Now imagine having to do that for a dozen networking engineers and software developers working on all sorts of semi-managed laptops. Containers seem to be one of the sane solutions1.
Setting up a network automation development environment is an interesting task:
Now imagine having to do that for a dozen networking engineers and software developers working on all sorts of semi-managed laptops. Containers seem to be one of the sane solutions1.
In his latest blog post, Tom Hollingsworth compares network device disaggregations with cord cutting (replacing cable TV subscription with Netflix and friends), and comes to the inevitable conclusion:
The idea is that you gain freedom and cheaper software. The hope is that you can build an enterprise network for half of what it would normally cost. The reality is that you’re going to gain less functionality and spend more time integrating things together on your own instead of just putting in a turnkey solution.
To rephrase it, you’ll design a snowflake network with snowflake devices. Good job – just because it makes sense for the FAANG club (or LinkedIn), it doesn’t mean you should be doing it.
In his latest blog post, Tom Hollingsworth compares network device disaggregations with cord cutting (replacing cable TV subscription with Netflix and friends), and comes to the inevitable conclusion:
The idea is that you gain freedom and cheaper software. The hope is that you can build an enterprise network for half of what it would normally cost. The reality is that you’re going to gain less functionality and spend more time integrating things together on your own instead of just putting in a turnkey solution.
To rephrase it, you’ll design a snowflake network with snowflake devices. Good job – just because it makes sense for the FAANG club (or LinkedIn), it doesn’t mean you should be doing it.
After the (in)famous October 2021 Facebook outage, Corey Quinn invited me for another Screaming in the Cloud chat, this time focusing on what went wrong (hint: it wasn’t DNS or BGP).
We also touched on VAX/VMS history, how early CCIE lab exams worked, how BGP started, why there are only 13 root name servers (not really), and the transition from networking being pure magic to becoming a commodity. Hope you’ll enjoy our chat as much as I did.
After the (in)famous October 2021 Facebook outage, Corey Quinn invited me for another Screaming in the Cloud chat, this time focusing on what went wrong (hint: it wasn’t DNS or BGP).
We also touched on VAX/VMS history, how early CCIE lab exams worked, how BGP started, why there are only 13 root name servers (not really), and the transition from networking being pure magic to becoming a commodity. Hope you’ll enjoy our chat as much as I did.
Dmytro Shypovalov sent me his views on the hardware differences between routers and switches. Enjoy!
So, a long time ago routers were L3 with CPU forwarding and switches were L2 with ASIC. Then they had invented TCAM and L3 switches, and since then ASICs have evolved to support more features (QoS, encapsulations etc) and store more routes, while CPU-based architectures have evolved to specialised NPU and parallel processing (e.g. Cisco QFX) to handle more traffic, while supporting all features of CPU forwarding.
Dmytro Shypovalov sent me his views on the hardware differences between routers and switches. Enjoy!
So, a long time ago routers were L3 with CPU forwarding and switches were L2 with ASIC. Then they had invented TCAM and L3 switches, and since then ASICs have evolved to support more features (QoS, encapsulations etc) and store more routes, while CPU-based architectures have evolved to specialised NPU and parallel processing (e.g. Cisco QFX) to handle more traffic, while supporting all features of CPU forwarding.
Network automation CI/CD pipeline seems to be the next hot thing, with vendors and bloggers describing in detail how you could get it done. How realistic is that idea for an average environment that’s barely starting its automation journey?
TL&DR: it will take a long time to get there, and lack of tests is the first showstopper.
Network automation CI/CD pipeline seems to be the next hot thing, with vendors and bloggers describing in detail how you could get it done. How realistic is that idea for an average environment that’s barely starting its automation journey?
TL&DR: it will take a long time to get there, and lack of tests is the first showstopper.
The multi-threaded routing daemons blog post generated numerous in-depth comments here and on LinkedIn. As always, thanks a million for keeping me honest and providing more details or additional perspectives. Here are some of the best bits.
Jeff Tantsura provided the first dose of reality:
All modern routing protocols implementations are multi-threaded, with a minimum separation of adjacency handling, route calculations and update generation. Note - writing multi-threaded code for complex tasks is a non trivial exercise (you could search for thread safety and similar artifacts and what happens when not implemented correctly). Moving to a multi-threaded code in early 2010s resulted in a multi-release (year) effort and 100s of related bugs all around.
Dr. Tony Przygienda added his hands-on experience (he’s been developing routing protocol software for ages):
The multi-threaded routing daemons blog post generated numerous in-depth comments here and on LinkedIn. As always, thanks a million for keeping me honest and providing more details or additional perspectives. Here are some of the best bits.
Jeff Tantsura provided the first dose of reality:
All modern routing protocols implementations are multi-threaded, with a minimum separation of adjacency handling, route calculations and update generation. Note - writing multi-threaded code for complex tasks is a non trivial exercise (you could search for thread safety and similar artifacts and what happens when not implemented correctly). Moving to a multi-threaded code in early 2010s resulted in a multi-release (year) effort and 100s of related bugs all around.
Dr. Tony Przygienda added his hands-on experience (he’s been developing routing protocol software for ages):
The Anycast Works Just Fine with MPLS/LDP blog post generated so much interest that I decided to check a few similar things, including running BGP-based anycast over a BGP-free core, and using BGP Labeled Unicast (BGP-LU).
We’ll use the same physical topology we used in the OSPF+MPLS anycast example: a leaf-and-spine fabric (admittedly with a single spine) with three anycast servers advertising 10.42.42.42/32 attached to two of the leafs:
The Anycast Works Just Fine with MPLS/LDP blog post generated so much interest that I decided to check a few similar things, including running BGP-based anycast over a BGP-free core, and using BGP Labeled Unicast (BGP-LU).
We’ll use the same physical topology we used in the OSPF+MPLS anycast example: a leaf-and-spine fabric (admittedly with a single spine) with three anycast servers advertising 10.42.42.42/32 attached to two of the leafs: