If you’re an ipSpace.net subscriber, you might have noticed how busy the last month has been (more about that later). February won’t be much better:
Finally, I’ll run a day-long workshop in Zurich on March 10th describing containers and Docker.
Unless you’re working for a cloud-only startup, you’ll always have to connect applications running in a public cloud with existing systems or databases running in a more traditional environment, or connect your users to public cloud workloads.
Public cloud providers love stable and robust solutions, and they took the same approach when implementing their legacy connectivity solutions: you could use routed Ethernet connections or IPsec VPN, and run BGP across them, turning the problem into a well-understood routing problem.
Read more ...In January 2020 Doug Heckaman documented his experience with VeloCloud SD-WAN. He tried to be positive, but for whatever reason this particular bit caught my interest:
Edge Gateways have a limited number of tunnels they can support […]
WTF? Wasn’t x86-based software packet forwarding supposed to bring infinite resources and nirvana? How badly written must your solution be to have a limited number of IPsec tunnels on a decent x86 CPU?
Read more ...Enterprise environments usually implement “mission-critical” applications by pushing high-availability requirements down the stack until they hit networking… and then blame the networking team when the whole house of cards collapses.
Most public cloud providers are not willing to play the same stupid blame-shifting game - they live or die by their reputation, and maintaining a stable service is their highest priority. They will do their best to implement a robust and resilient infrastructure, but will not do anything that could impact its stability or scalability… including the snake oil the virtualization and networking vendors love to sell to their gullible customers. When you deploy your application workloads into a public cloud, you become responsible for the resiliency of your own application, and there’s no magic button that could allow you to push the problems down the stack.
Read more ...Some network devices return structured data in either text- or XML format (but cannot spell JSON). Ansible prefers getting JSON-formatted data, and has a number of filters to process text printouts… but what could you do if you want to work with XML documents within Ansible? I described a few solutions in Transforming XML Data in Ansible.
Doing the same thing and hoping for a different result is supposedly a definition of insanity… and managing public cloud deployments with an unrepeatable sequence of GUI clicks comes pretty close to it.
Engineers who mastered the art of public cloud deployments realized decades ago that the only way forward is to treat infrastructure in the same way as any other source code:
Read more ...It’s amazing how quickly you get “must have feature Y or it should not be called X” comments coming from vendor engineers the moment you mention something vaguely-defined like SD-WAN.
Here are just two of the claims I got as a response to “BGP with IP-SLA is SD-WAN” trolling I started on LinkedIn based on this blog post:
Key missing features [of your solution]:
- real time circuit failover (100ms is not real-time)
- traffic steering (again, 100ms is not real-time)
Let’s get the facts straight: it seems Cisco IOS evaluates route-map statements using track objects in periodic BGP table scan process, so the failover time is on order of 30 seconds plus however long it takes IP SLA to detect the decreased link quality.
Read more ...I hope you're familiar with Clarke's third law (and leave it to your imagination to explain how it relates to SDN ;). In case you want to look beyond the Machine Learning curtain, you might find the Machine Learning Explained article highly interesting. Spoiler: it all started in 1960s with over 2000 matchboxes.
After a brief overview of FRRouting suite Donald Sharp continued with a deep dive into FRR architecture, including the various routing daemons, role of Zebra and ZAPI, interface between RIB (Zebra) and FIB (Linux Kernel), sample data flow for route installation, and multi-threading in Zebra and BGP daemons.
If your automation solution relies on a back-end database with strict database schema you can stop reading… but if you (like most others) still live in the land of text files encoded in your favorite presentation format (because it’s hip to hate YAML), you might appreciate the solution Donald Johnson uses to check his data models before committing them into Git repository.
A little while ago I explained why you can’t use more than 4K VXLAN segments on a ToR switch (at least with most ASICs out there). Does that mean that you’re limited to a total of 4K virtual ethernet segments?
Of course not.
You could implement overlay virtual networks in software (on hypervisors or container hosts), although even there the enterprise products rarely give you more than a few thousand logical switches (to use NSX terminology)… but that’s a product, not technology limitation. Large public cloud providers use the same (or similar) technology to run gazillions of tenant segments.
Read more ...Listening to public cloud evangelists and marketing departments of vendors selling over-the-cloud networking solutions or multi-cloud orchestration systems, you could start to believe that migrating your workload to a public cloud would solve all your problems… and if you’re gullible enough to listen to them, you’ll get the results you deserve.
Unfortunately, nothing can change the fundamental laws of physics, networking, or application architectures:
Read more ...One of the worst things that can happen to anyone selecting equipment for a new network infrastructure is to receive the End-of-Life notice a week after the gear has been deployed in a production network… or maybe it’s even worse to be stuck with a neglected piece of technology full of bugs that the vendor never fixes because they’re chasing other shinier squirrels.
If you’re careful and watch what the vendors are doing, you might be able to save the day and identify the early phases of product decline. Here they are (as seen from the outside) in approximate order:
End of promotion opportunities. In most corporations aggressive hunters fare better than meticulous farmers, and product development is no different. As a friend of mine working for a large corporation once said “The culture here rewards launches instead of steady improvements. Like in academia, publishing a paper is valued more than running ISS”.
Read more ...Remember the Windows version that was so security-focused that it broke everything, and needed a gazillion changes/updates/upgrades to get back to where you had a working computer? I think it was Vista, but maybe my memory is failing me. Anyway, Apple got its Vista moment with macOS Catalina.
I was stupid enough to upgrade just before New Year, and I’m still struggling with aftereffects and skeletons falling out of every cupboard I look at. I appreciate Apple trying to make their operating system ever more secure, but breaking stuff every time I upgrade it is borderline ridiculous.
Read more ...The next time the sales system engineer working for your beloved $vendor drops by with a glitzy unicorn-based slide deck full of AI/ML goodies, read this article to get a slightly better understanding of where we are... from the perspective of someone who has actual experience doing that stuff.
What better way to start How Networks Really Work webinar than with fallacies of distributed computing… and that’s exactly what I did in late August 2019.
I always tell networking engineers attending our Building Network Automation Solutions online course to create minimalistic data models with (preferably) no redundant information. Not surprisingly, that’s a really hard task (see this article for an example) - using a simple automation tool like Ansible you end with either a messy and redundant data model or Jinja2 templates (or Ansible playbooks) full of hard-to-understand and impossible-to-maintain business logic.
Stephen Harding solved this problem the right way: his data center fabric deployment solution uses a dynamic inventory script that translates operator-friendly fabric description (data model) into template-friendly set of device variables.
Read more ...Another EVPN reader question, this time focusing on auto-RD functionality and how it works with duplicate MAC addresses:
If set to Auto, RD generated will be different for the same VNI across the EVPN switches. If the same route (MAC and/or IP) is present under different leaves of the same L2VNI, since the RD is different there is no best path selection and both will be considered. It’s a misconfiguration and shouldn’t be allowed. How will the BGP deal with this?
If you’re running a typical (somewhat outdated) enterprise data center, you’re using tons of VLANs and firewalls, use VLANs as security zones, and push inter-VLAN traffic through firewalls for inspection. Security vendors love that approach - when inspecting traffic they can add no value to (like database- or backup sessions), the firewalls quickly become choke points that have to be upgraded.
Read more ...Here’s an interesting tidbit from “Last Week in AWS” blog:
From a philosophical point of view, AWS fundamentally considers an API to be a promise. Services that aren’t promoted anymore are still available […] Think about that for a second - a service launched 13 years ago is still actively supported to the point where you can use it today.
Compare that to Killed By Google graveyard, and you might understand why I’m a bit reluctant to cover GCP in my webinars.
Read more ...