I concluded the Focus on Business Challenges First presentation (part of Business Aspects of Networking Technologies webinar) with a few technology guidelines starting with:
For more guidelines, watch the video.
Every now and then I’m getting questions along the lines “why doesn’t X support unequal-cost multipathing (UCMP)?” for X in [ OSPF, BGP, IS-IS ].
To set the record straight: BGP does support some rudimentary form of unequal-cost multipathing with the DMZ Bandwidth community, but it only works across multiple egress points from a single autonomous system. Follow-up nerd knobs described how to use the same community over EBGP sessions; not sure whether anyone implemented that part (comments welcome).
Every now and then I’m getting questions along the lines “why doesn’t X support unequal-cost multipathing (UCMP)?” for X in [ OSPF, BGP, IS-IS ].
To set the record straight: BGP does support some rudimentary form of unequal-cost multipathing with the DMZ Bandwidth community, but it only works across multiple egress points from a single autonomous system. Follow-up nerd knobs described how to use the same community over EBGP sessions; not sure whether anyone implemented that part (comments welcome).
One of my readers was “blessed” with the stretched VLANs requirement combined with the need for inter-VLAN routing and sub-par equipment from a vendor not exactly known for their data center switching products. Before going on, you might want to read his description of the challenge he’s facing and what I had to say about the idea of building stackable switches across multiple locations.
Here’s an overview diagram of what my reader was facing. The core switches in each location work as a single device (virtual chassis), and there’s MLAG between core and edge switches. The early 2000s just called and they were proud of the design (but to be honest, sometimes one has to work with the tools his boss bought, so…).
One of my readers was “blessed” with the stretched VLANs requirement combined with the need for inter-VLAN routing and sub-par equipment from a vendor not exactly known for their data center switching products. Before going on, you might want to read his description of the challenge he’s facing and what I had to say about the idea of building stackable switches across multiple locations.
Here’s an overview diagram of what my reader was facing. The core switches in each location work as a single device (virtual chassis), and there’s MLAG between core and edge switches. The early 2000s just called and they were proud of the design (but to be honest, sometimes one has to work with the tools his boss bought, so…).
Now that we know what regions and availability zones are, let’s go back to Daniel Dib’s question:
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? Does subnet matter if your VMs are in different AZs?
Wait, what? A subnet is stretched across multiple failure domains? Didn’t Ivan claim that’s ridiculous?
TL&DR: What I claimed was that a single layer-2 network is a single failure domain. Things are a bit more complex in public clouds. Keep reading and you’ll find out why.
Now that we know what regions and availability zones are, let’s go back to Daniel Dib’s question:
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? Does subnet matter if your VMs are in different AZs?
Wait, what? A subnet is stretched across multiple failure domains? Didn’t Ivan claim that’s ridiculous?
TL&DR: What I claimed was that a single layer-2 network is a single failure domain. Things are a bit more complex in public clouds. Keep reading and you’ll find out why.
A few weeks ago Adrian Giacometti described a no-stretched-VLANs disaster recovery design he used for one of his customers.
The blog post and related LinkedIn posts generated tons of comments (and objections from the usual suspects), prompting Adrian to write a sequel describing the design requirements he was facing, tradeoffs he made, and interactions between server and networking team needed to make it happen.
A few weeks ago Adrian Giacometti described a no-stretched-VLANs disaster recovery design he used for one of his customers.
The blog post and related LinkedIn posts generated tons of comments (and objections from the usual suspects), prompting Adrian to write a sequel describing the design requirements he was facing, tradeoffs he made, and interactions between server and networking team needed to make it happen.
The next time you’re about to whimper how you can’t do anything to get rid of stretched VLANs (or some other stupidity) because whatever, take a few minutes and read How To Put Faith in UX Design by Scott Berkun, mentally replacing UX Design with Network Design. Here’s the part I loved most:
[… ]there are only three reasonable choices:
- Move into a role where you make the important decisions.
- Become better at influencing decision makers.
- Find a place to work that has higher standards (or start your own).
Unfortunately the most common choice might be #4: complain and/or do nothing.
The next time you’re about to whimper how you can’t do anything to get rid of stretched VLANs (or some other stupidity) because whatever, take a few minutes and read How To Put Faith in UX Design by Scott Berkun, mentally replacing UX Design with Network Design. Here’s the part I loved most:
[… ]there are only three reasonable choices:
- Move into a role where you make the important decisions.
- Become better at influencing decision makers.
- Find a place to work that has higher standards (or start your own).
Unfortunately the most common choice might be #4: complain and/or do nothing.
In January, Jason Edelman kindly invited me for a chat about the state of (software defined) networking and network automation in particular. The recording was recently published on Network Collective.
In January, Jason Edelman kindly invited me for a chat about the state of (software defined) networking and network automation in particular. The recording was recently published on Network Collective.
Last year I wrote an article describing data model optimization going from a simple this is what we need to configure individual devices to a highly polished high-level network nodes and links model. Not surprisingly, as Jeremy Schulman was quick to point out, the latter one had Jinja2 templates you wouldn’t want to debug. Ever. You can’t run away from complexity… but you can manage it.
Many successful network automation solutions (example: Cisco NSO) solve the “we’d love to work with high-level data models but hate complex templates” challenge with data transformation: operators work with an abstracted data model describing services, nodes and links, and the device configuration templates use low-level data derived from the abstracted data models through a series of business logic rules or lookups (aka network design).
Last year I wrote an article describing data model optimization going from a simple this is what we need to configure individual devices to a highly polished high-level network nodes and links model. Not surprisingly, as Jeremy Schulman was quick to point out, the latter one had Jinja2 templates you wouldn’t want to debug. Ever. You can’t run away from complexity… but you can manage it.
Many successful network automation solutions (example: Cisco NSO) solve the “we’d love to work with high-level data models but hate complex templates” challenge with data transformation: operators work with an abstracted data model describing services, nodes and links, and the device configuration templates use low-level data derived from the abstracted data models through a series of business logic rules or lookups (aka network design).
One of my readers sent me this interesting question:
Assuming we are running a very large OSPF area with a few thousand nodes. If we follow the chain reaction of OSPF LSA flooding while the network is converging at the same time, how would all routers come to know that they all now have same view of area link states and there are no further updates or convergence?
I have bad news: the design requirements for link state protocols effectively prevent that idea from ever working well.
One of my readers sent me this interesting question:
Assuming we are running a very large OSPF area with a few thousand nodes. If we follow the chain reaction of OSPF LSA flooding while the network is converging at the same time, how would all routers come to know that they all now have same view of area link states and there are no further updates or convergence?
I have bad news: the design requirements for link state protocols effectively prevent that idea from ever working well.
My friend Daniel Dib sent me this interesting question:
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? Does subnet matter if your VMs are in different AZs?
I’m positive I don’t have to tell you what networks, subnets, and VRFs are, but you might not have worked with public cloud availability zones before. Before going into the details of Daniel’s question (and it will take us three blog posts to get to the end), let’s introduce regions and availability zones (you’ll find more details in AWS Networking and Azure Networking webinars).
My friend Daniel Dib sent me this interesting question:
As I understand it, subnets in Azure span availability zones. Do you see any drawback to this? Does subnet matter if your VMs are in different AZs?
I’m positive I don’t have to tell you what networks, subnets, and VRFs are, but you might not have worked with public cloud availability zones before. Before going into the details of Daniel’s question (and it will take us three blog posts to get to the end), let’s introduce regions and availability zones (you’ll find more details in AWS Networking and Azure Networking webinars).
Here’s a recent tweet by my friend Joe Onisick that triggered this blog post:
My favorite people are the ones that start with “how could we make that work?” Before jumping into all of their preconceived bs on why it won’t work.
I couldn’t agree more with that sentiment. The number of people who would invent all sorts of excuses just to avoid turning on their brains and keep to their cozy old methods is staggering. Unfortunately, someone immediately had the urge to switch into what I understood to be a heroic MacGyver mode (or maybe it was just my lack of caffeine, in which case I apologize for the misquote… but you might still like the rest of the rant):