Ivan Pepelnjak

Author Archives: Ivan Pepelnjak

Worth Reading: Azure Datacenter Switch Failures

Microsoft engineers published an analysis of switch failures in 130 Azure regions (review of the article, The Next Platform summary):

  • A data center switch has a 2% chance of failing in 3 months (= less than 10% per year);
  • ~60% of the failures are caused by hardware faults or power failures, another 17% are software bugs;
  • 50% of failures lasted less than 6 minutes (obviously crashes or power glitches followed by a reboot).
  • Switches running SONiC had lower failure rate than switches running vendor NOS on the same hardware. Looks like bloatware results in more bugs, and taking months to fix bugs results in more crashes. Who would have thought…

Worth Reading: Azure Datacenter Switch Failures

Microsoft engineers published an analysis of switch failures in 130 Azure regions (review of the article, The Next Platform summary):

  • A data center switch has a 2% chance of failing in 3 months (= less than 10% per year);
  • ~60% of the failures are caused by hardware faults or power failures, another 17% are software bugs;
  • 50% of failures lasted less than 6 minutes (obviously crashes or power glitches followed by a reboot).
  • Switches running SONiC had lower failure rate than switches running vendor NOS on the same hardware. Looks like bloatware results in more bugs, and taking months to fix bugs results in more crashes. Who would have thought…

Worth Reading: Running BGP in Large-Scale Data Centers

Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.

Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.

Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.

Worth Reading: Running BGP in Large-Scale Data Centers

Here’s one of the major differences between Facebook and Google: one of them publishes research papers with helpful and actionable information, the other uses publications as recruitment drive full of we’re so awesome but you have to trust us – we’re not sharing the crucial details.

Recent data point: Facebook published an interesting paper describing their data center BGP design. Absolutely worth reading.

Just in case you haven’t realized: Petr Lapukhov of the RFC 7938 fame moved from Microsoft to Facebook a few years ago. Coincidence? I think not.

Packet Forwarding and Routing over Unnumbered Interfaces

In the previous blog posts in this series, we explored whether we need addresses on point-to-point links (TL&DR: no), whether it’s better to have interface or node addresses (TL&DR: it depends), and why we got unnumbered IPv4 interfaces. Now let’s see how IP routing works over unnumbered interfaces.

The Challenge

A cursory look at an IP routing table (or at CCNA-level materials) tells you that the IP routing table contains prefixes and next hops, and that the next hops are IP addresses. How should that work over unnumbered interfaces, and what should we use for the next-hop IP address in that case?

Packet Forwarding and Routing over Unnumbered Interfaces

In the previous blog posts in this series, we explored whether we need addresses on point-to-point links (TL&DR: no), whether it’s better to have interface or node addresses (TL&DR: it depends), and why we got unnumbered IPv4 interfaces. Now let’s see how IP routing works over unnumbered interfaces.

The Challenge

A cursory look at an IP routing table (or at CCNA-level materials) tells you that the IP routing table contains prefixes and next hops, and that the next hops are IP addresses. How should that work over unnumbered interfaces, and what should we use for the next-hop IP address in that case?

Mythbusting: NFV Data Center Fabric Buffering Requirements

Every now and then I stumble upon an article or a comment explaining how Network Function Virtualization (NFV) introduces new data center fabric buffering requirements. Here’s a recent example:

For Telco/carrier Cloud environments, where NFVs (which are much slower than hardware SGW) get used a lot, latency is higher with a lot of jitter due to the nature of software and the varying link speeds, so DC-level near-zero buffer is not applicable.

It seems to me we’re dealing with another myth. Starting with the basics:

Mythbusting: NFV Data Center Fabric Buffering Requirements

Every now and then I stumble upon an article or a comment explaining how Network Function Virtualization (NFV) introduces new data center fabric buffering requirements. Here’s a recent example:

For Telco/carrier Cloud environments, where NFVs (which are much slower than hardware SGW) get used a lot, latency is higher with a lot of jitter due to the nature of software and the varying link speeds, so DC-level near-zero buffer is not applicable.

It seems to me we’re dealing with another myth. Starting with the basics:

Feedback: Kubernetes Networking Deep Dive

Here’s what one of the engineers watching Stuart Charlton’s Kubernetes webinar wrote about it:

“Kubernetes Networking Deep Dive” is a must see webinar. Once done take a break and then watch it again, let it sink in and then sign-up for a free account with Azure or GCP and practice all that was learned during the webinar.

At the end of this exercise … one will begin to understand why the networking domain seems to be lagging behind … This webinar will help one pick up the pace!

Feedback: Kubernetes Networking Deep Dive

Here’s what one of the engineers watching Stuart Charlton’s Kubernetes webinar wrote about it:

“Kubernetes Networking Deep Dive” is a must see webinar. Once done take a break and then watch it again, let it sink in and then sign-up for a free account with Azure or GCP and practice all that was learned during the webinar.

At the end of this exercise … one will begin to understand why the networking domain seems to be lagging behind … This webinar will help one pick up the pace!

Video: We Still Need Networking in Public Clouds

Whenever someone starts mansplaining that you we need no networking when we move the workloads into a public cloud, please walk away – he has just proved how clueless he is.

He might be a tiny bit correct when talking about software-as-a-service (after all, it’s just someone else’s web site), but when it comes to complex infrastructure virtual networks, there’s plenty of networking involved, from packet filters and subnets to NAT, load balancers, firewalls, BGP and IPsec.

For more details, watch the We Still Need Networking in Public Clouds video (part of Introduction to Cloud Computing webinar).

You need Free ipSpace.net Subscription to watch the video

Video: We Still Need Networking in Public Clouds

Whenever someone starts mansplaining that we need no networking when we move the workloads into a public cloud, please walk away – he has just proved how clueless he is.

He might be a tiny bit correct when talking about software-as-a-service (after all, it’s just someone else’s web site), but when it comes to complex infrastructure virtual networks, there’s plenty of networking involved, from packet filters and subnets to NAT, load balancers, firewalls, BGP and IPsec.

For more details, watch the We Still Need Networking in Public Clouds video (part of Introduction to Cloud Computing webinar).

You need Free ipSpace.net Subscription to watch the video
1 82 83 84 85 86 180