One of the responses to my Disaster Recovery Faking blog post focused on failure domains:
What is the difference between supporting L2 stretched between two pods in your DC (which everyone does for seamless vMotion), and having a 30ms link between these two pods because they happen to be in different buildings?
I hope you agree that a single broadcast domain is a single failure domain. If not, let agree to disagree and move on - my life is too short to argue about obvious stuff.
Read more ...Two weeks ago Nicola Modena explained how to design BGP routing to implement resilient high-availability network services architecture. The next step to tackle was obvious: how do you fine-tune convergence times, and how does BGP convergence compare to the more traditional FHRP-based design.
You’ve probably heard cloudy evangelists telling CIOs how they won’t need the infrastructure engineers once they move their workloads into a public cloud. As always, whatever sounds too good to be true usually is. Compute resources in public clouds still need to be managed, someone still needs to measure application performance, and backups won’t happen by themselves.
Even more important (for networking engineers), network requirements don’t change just because you decided to use someone else’s computers:
Read more ...I stumbled upon a great MIT Technology Review article (warning: regwall ahead) with a checklist you SHOULD use whenever considering a machine-learning-based product.
While the article focuses on machine learning at least some of the steps in that list apply to any new product that claims to use a brand new technology in a particular problem domain like overlay virtual networking with blockchain:
Read more ...Starting with my faking disaster recovery tests blog post Terry Slattery wrote a great article delving into the intricacies of DR testing, types of expected disasters, and resilience engineering. As always, a fantastic read from Terry.
No, we were not talking about IP fabrics in general - IP Fabric is a network management software (oops, network assurance platform) Gian Paolo discovered a while ago and thoroughly tested in the meantime.
He was kind enough to share what he found in Episode 107 of Software Gone Wild, and as Chris Young succinctly summarized: “it’s really sad what we still get excited about something 30 years after it was first promised”… but maybe this time it really works ;)
The registration is still open for the Using VXLAN to Build Active-Active Data Centers workshop on December 3rd, but if you can’t make it to Zurich you might enjoy these live sessions we’ll run in December 2019:
All webinars I mentioned above are accessible with Standard ipSpace.net Subscription, and you’ll need Expert Subscription to enjoy the automation course contents.
Someone sent me this observation after reading my You Cannot Have Public Cloud without Networking blog post:
As much as I sympathize with your view, scales matter. And if you make ATMs that deal with all the massive client population, the number of bank tellers needed will go down. A lot.
Based on what I read a while ago a really interesting thing happened in financial industry: while the number of tellers went down, number of front-end bank employees did not go down nearly as dramatically, they just turned into “consultants”.
Read more ...Got an interesting set of questions from a networking engineer who got stuck with the infamous “let’s push the **** down the stack” challenge:
So I am a rather green network engineer trying to solve the typical layer two stretch problem.
I could start the usual “friends don’t let friends stretch layer-2” or “your business doesn’t really need that” windmill fight, but let’s focus on how the vendors are trying to sell him the “perfect” solution:
Read more ...I’m running two workshops in Zurich in the next 10 days:
I published the slide deck for the NSX versus ACI workshop a few days ago (and you can already download it if you have a paid ipSpace.net subscription) and it’s full of new goodness like ACI vPod, multi-pod ACI, multi-site ACI, ACI-on-AWS, and multi-site NSX-V and NSX-T.
Steve Bellovin wrote a great series of articles describing the early history of Usenet. The most interesting part in the “security and authentication” part was probably this gem:
That left us with no good choices. The infrastructure for a cryptographic solution was lacking. The uux command rendered illusory any attempts at security via the Usenet programs themselves. We chose to do nothing. That is, we did not implement fake security that would give people the illusion of protection but not the reality.
A lot of other early implementers chose the same route, resulting in SMTP, BGP… which wouldn’t be a problem if someone kept track of that and implemented security a few years later. Unfortunately we considered those problems solved and moved on to chase other squirrels. We’re still paying the interest on that technical debt.
Another great advice from Charity Majors: does it make sense to go back to being an engineer after being a manager for a few years?
Personal note: finding a great replacement for my CTO role was probably the best professional decision I ever made ;)
Original TCP/IP and OSI network stacks had relatively clean layered architecture (forgetting the battle scars for the moment) and relied on end-to-end principle to keep the network core simple.
As always, no good deed goes unpunished - “creative” individuals trying to force-fit their mis-designed star-shaped pegs into round holes, and networking vendors looking for competitive advantage quickly destroyed the idea with tons of middlebox devices, ranging from firewalls and load balancers to NAT, WAN optimization, and DPI monstrosities.
You need free ipSpace.net subscription to watch the video, or a paid ipSpace.net subscriptions to watch the whole How Networks Really Work webinar.
We are proud to announce a great lineup of guest speakers for the first Networking in Public Cloud Deployments course that will run in Spring 2020:
Here’s another “let’s use network automation tools to create reports we couldn’t get in the past” (like IP multicast trees) solution coming from an attendee in our network automation course: Paddy Kelly created L3VPN graphs detailing PE-to-CE connectivity using Cisco’s pyATS to parse the Cisco IOS printouts.
You’ll find dozens of other interesting solutions on our Sample Network Automation Solutions page - all of them were created by networking engineers who knew almost nothing about network automation or open-source automation tools when they started our automation course.
Remember the “BGP as High Availability Protocol” article Nicola Modena wrote a few months ago? He finally found time to extend it with BGP design considerations and a description of a seamless-and-safe firewall software upgrade procedure.
Every now and then a smart person decides to walk away from their competence zone, and start spreading pointless clickbait opinions like BGP is a hot mess.
Like any other technology, BGP is just a tool with its advantages and limitations. And like any other tool, BGP can be used sloppily… and that’s what’s causing the various problems and shenanigans everyone is talking about.
Just in case you might be interested in facts instead of easy-to-digest fiction:
Read more ...Russ White wrote a great blog post explaining why you have to understand the problem you’re solving instead of blindly believing the $vendor slide deck… or as I said a long time ago, think about how you’ll troubleshoot your network in because you won’t be able to reformat it once it crashes.
Read more ...I’ve seen successful public (infrastructure) cloud deployments… but also spectacular failures. The difference between the two usually comes down to whether the team deploying into a public cloud environment realizes they’re dealing with an unfamiliar environment and acts accordingly.
Please note that I’m not talking about organizations migrating their email to Office 365. While that counts as public cloud deployment when an industry analyst tries to paint a rosy picture of public cloud acceptance, I’m more interested in organizations using compute, storage, security and networking public cloud infrastructure.
Read more ...One of my readers sent me a question along these lines…
VXLAN Network Identifier is 24 bit long, giving 16 us million separate segments. However, we have to map VNI into VLANs on most switches. How can we scale up to 16 million segments when we have run out of VLAN IDs? Can we create a separate VTEP on the same switch?
VXLAN is just an encapsulation format and does not imply any particular switch architecture. What really matters in this particular case is the implementation of the MAC forwarding table in switching ASIC.
Read more ...