Finn Turner

Author Archives: Finn Turner

768K day: the importance of adaptable software in the growing Internet

The Internet is big. Moreover, the Internet is bigger now than when that first sentence was written, and keeps increasing in size. The growth of the Internet from its humble beginnings as a DARPA research project was unprecedented and almost entirely unexpected. This—as well as the widespread usage of older routers and switches as crucial connection points in the Internet—has resulted in real-world scaling issues.

One of these issues, known commonly as “512K day,” occurred on Aug. 12, 2014. On that day, Verizon, a large United States-based Internet provider and Internet exchange point (IXP), submitted an extra 15,000 routes to the global BGP routing table. As these routes propagated across the network, they were accepted by some routers—the ones that had new firmware, or were configured to only store a subset of the global routing table.

But in other routers, this additional route load overran the 512,000 route maximum expected by the firmware designers, causing widespread Internet outages and degradation of service. In many cases, the issue was resolved quickly, but not as quickly as it could have been. Proprietary vendors were required to push out firmware updates for hundreds of router and switch models—a process that can take months. Continue reading

Topology matters: how port-per-workload management strategies no longer hold up

In the beginning, there were switches. And connected to these switches were servers, routers and other pieces of gear. These devices ran one application, or at a stretch, multiple applications on the same operating system and thus IP stack. It was very much one-server-per-port; the SQL Server was always on port 0/8, and shutting down port 0/8 would affect only that machine.

This is no longer true, as network engineers well know. Physical hardware no longer dictates what, where, and how servers and other workloads exist. Cloud computing, multi-tenant virtual infrastructures and dynamically reallocated virtual resources mean that one port can cover 20 or 200 servers. Conversely, link aggregation and other forms of port density protocols mean that one server can have fault-tolerant aggregated links across one, five or 50 ports.

A new way of looking at switching—as a logical, rather than physical, topology—is required. In this view, switches aren’t so much pieces of the network architecture themselves, but simply ports that can be used to set up much more complex logical links. This article will focus on two main concepts: routing protocols (to allow better utilization of underutilized switching links) and switching protocols such as STP (those used to Continue reading

The ease and importance of scaling in the enterprise

Networks are growing, and growing fast. As enterprises adopt IoT and mobile clients, VPN technologies, virtual machines (VMs), and massively distributed compute and storage, the number of devices—as well as the amount of data being transported over their networks—is rising at an explosive rate. It’s becoming apparent that traditional, manual ways of provisioning don’t scale. Something new needs to be used, and for that, we look toward hyperscalers; companies like Google, Amazon and Microsoft, who’ve been dealing with huge networks almost since the very beginning.

The traditional approach to IT operations has been focused on one server or container at a time. Any attempt at management at scale frequently comes with being locked into a single vendor’s infrastructure and technologies. Unfortunately, today’s enterprises are finding that even the expensive, proprietary management solutions provided by the vendors who have long supported traditional IT practices simply cannot scale, especially when you consider the rapid growth of containerization and VMs that enterprises are now dealing with.

In this blog post, I’ll take a look at how an organization can use open, scalable network technologies—those first created or adopted by the aforementioned hyperscalers—to reduce growing pains. These issues are increasingly relevant as new Continue reading

The Importance of sFlow and NetFlow in Data Center Networks

As networks get more complex, and higher-speed interconnects are required, in-depth information about the switches serving these networks becomes crucial to maintain quality-of-service, perform billing, and manage traffic in a shared environment.

Some of you reading this blog post may already be familiar with “sFlow,” an industry-standard technology for monitoring high-speed switched networks and obtaining insights about the data traversing them. This blog post will focus on the importance of sFlow and the similar technology, “NetFlow,” in large – and getting larger – data centers.

Comparing sFlow and NetFlow

sFlow and NetFlow are technologies that, by sampling traffic flows between ports on a switch or interfaces on a router, can provide data about network activity, such as uplink load, total bandwidth used, graphs of history, and so on. To take this data and put it into a form that’s easily digestable, there is NfSen, a web-based front-end for these tools.

While sFlow and NetFlow may – at least on the surface – sound the same, they have underlying protocol differences that may be relevant, depending on your use case. sFlow is, as previously stated, an industry-standard technology. This dramatically increases the chances the sFlow agent (the piece of Continue reading

Buzzword bingo: NetDevOps edition

Looking at the marketing landscape for IT, you could be forgiven for thinking that the current strategy was to dynamite a word factory and use the resulting debris as marketing content. DevSecOps. NetDevOps. Ops, ops, spam, eggs, spam, and DevSpamOps.

The naming trend lends itself easily to parody, but it began as shorthand for an attempt to solve real IT problems. And its iterations have more in common than a resemblance to alphabet salad. What lies beneath the buzzwords? And do you need to care?

Countless companies have jumped on the NetDevOps bandwagon, all with their own way of doing things; and most are utterly incompatible with everyone else. Some may have already abandoned the NetDevOps craze, believing it to be nothing but marketing hype wrapped around a YAML parser and some scripts. Others might have found a system that works for them and swear by it, using nothing else for provisioning.

Regardless of views, a system that allows for rapid provisioning and re-provisioning of applications, containers, virtual machines, and network infrastructure is paramount.

Ministry of Silly Names: A History

The modern era of namesmashing started with DevOps. This made a sort of sense because, before this, IT had Continue reading

The Multicloud We Need, But Not the One We Deserve

Large organizations are married to the VMware suite of products. We can quibble about numbers for adoption of Hyper-V and KVM, but VMware dominates the enterprise virtualization market, just as Kubernetes is the unquestioned champion of containers.

Virtual Machines (VMs) are a mature technology, created and refined before large-scale adoption of public cloud services. Cloud-native workloads are often designed for containers, and containerized workloads are designed to fail. You can tear one down on one cloud, and reinstantiate it on another. Near-instant reinastantiation is the defense against downtime.

VMs take a different approach. A VM is meant to keep existing for long periods of time, despite migrations and outages. Failure is to be avoided as much as possible. This presents a problem as more organizations pursue a multi-cloud IT strategy.

The key technology for highly available VMs is vMotion: the ability to move a VM from one node in a cluster to another with no downtime. However, as data centers themselves become increasingly virtualized, using cloud computing services such as Microsoft Azure, Google Compute Engine, and Amazon EC2, there’s a growing requirement to be able to move VMs between cloud infrastructures. This is not a supported feature of vMotion.

Routed Continue reading

BGP: What is it, how can it break, and can Linux BGP fix it?

Border Gateway Protocol (BGP) is one of the most important protocols on the internet. At the same time, when it breaks, it is one of the most potentially catastrophic.

As the internet grows ever larger and becomes ever more complex, having a well-configured BGP is crucial to keeping everything running smoothly. Unfortunately, when a BGP is not configured correctly, there can be disastrous consequences.

This blog will provide a brief explanation of what BGP is, and then dive into some of the common protocol issues and pitfalls. We cannot go too deep into the intricacies of BGP – those can (and do) fill entire books. However, we can provide an overview of how Linux (which has a standardized BGP protocol set and in-depth monitoring, analysis, and control tools) can be used to alleviate some of these common issues.

What is BGP?

BGP is a routing protocol that relies on TCP, designed for providing routing information in and between autonomous systems (ASes). In large networks, BGP is responsible for informing all hosts that need to know of the ways a packet can travel from site A to site B – and, if a site or router goes down, how to reroute the packet so Continue reading