Archive

Category Archives for "Networking"

The Relative Cost of Bandwidth Around the World

CC BY 2.0 by Kendrick Erickson

Over the last few months, there’s been increased attention on networks and how they interconnect. CloudFlare runs a large network that interconnects with many others around the world. From our vantage point, we have incredible visibility into global network operations. Given our unique situation, we thought it might be useful to explain how networks operate, and the relative costs of Internet connectivity in different parts of the world.

A Connected Network

The Internet is a vast network made up of a collection of smaller networks. The networks that make up the Internet are connected in two main ways. Networks can connect with each other directly, in which case they are said to be “peered”, or they can connect via an intermediary network known as a “transit provider”.

At the core of the Internet are a handful of very large transit providers that all peer with one another. This group of approximately twelve companies are known as Tier 1 network providers. Whether directly or indirectly, every ISP (Internet Service Provider) around the world connects with one of these Tier 1 providers. And, since the Tier 1 providers are all interconnected themselves, from any point on the network you should be able to reach any other point. That's what makes the Internet the Internet: it’s a huge group of networks that are all interconnected.

Paying to Connect

To be a part of the Internet, CloudFlare buys bandwidth, known as transit, from a number of different providers. The rate we pay for this bandwidth varies from region to region around the world. In some cases we buy from a Tier 1 provider. In other cases, we buy from regional transit providers that either peer with the networks we need to reach directly (bypassing any Tier 1), or interconnect themselves with other transit providers.

CloudFlare buys transit wholesale and on the basis of the capacity we use in any given month. Unlike some cloud services like Amazon Web Services (AWS) or traditional CDNs that bill for individual bits delivered across a network (called "stock"), we pay for a maximum utilization for a period of time (called "flow"). Typically, we pay based on the maximum number of megabits per second we use during a month on any given provider.

Traffic levels across CloudFlare's global network over the last 3 months. Each color represents one of our 28 data centers.

Most transit agreements bill the 95th percentile of utilization in any given month. That means you throw out approximately 36 not-necessarily-contiguous hours worth of peak utilization when calculating usage for the month. Legend has it that in its early days, Google used to take advantage of these contracts by using very little bandwidth for most of the month and then ship its indexes between data centers, a very high bandwidth operation, during one 24-hour period. A clever, if undoubtedly short-lived, strategy to avoid high bandwidth bills.

Another subtlety is that when you buy transit wholesale you typically only pay for traffic coming in (“ingress") or traffic going out (“egress”) of your network, not both. Generally you pay which ever one is greater.

CloudFlare is a caching proxy so egress (out) typically exceeds ingress (in), usually by around 4-5x. Our bandwidth bill is therefore calculated on egress so we don't pay for ingress. This is part of the reason we don't charge extra when a site on our network comes under a DDoS attack. An attack increases our ingress but, unless the attack is very large, our ingress traffic will still not exceed egress, and therefore doesn’t increase our bandwidth bill.

Peering

While we pay for transit, peering directly with other providers is typically free — with some notable exceptions recently highlighted by Netflix. In CloudFlare's case, unlike Netflix, at this time, all our peering is currently "settlement free," meaning we don't pay for it. Therefore, the more we peer the less we pay for bandwidth. Peering also typically increases performance by cutting out intermediaries that may add latency. In general, peering is a good thing.

The chart above shows how CloudFlare has increased the number of networks we peer with over the last three months (both over IPv4 and IPv6). Currently, we peer around 45% of our total traffic globally (depending on the time of day), across nearly 3,000 different peering sessions. The chart below shows the split between peering and transit and how it's improved over the last three months as we’ve added more peers.

North America

We don't disclose exactly what we pay for transit, but I can give you a relative sense of regional differences. To start, let's assume as a benchmark in North America you'd pay a blended average across all the transit providers of $10/Mbps (megabit per second per month). In reality, we pay less than that, but it can serve as a benchmark, and keep the numbers round as we compare regions. If you assume that benchmark, for every 1,000Mbps (1Gbps) you'd pay $10,000/month (again, acknowledge that’s higher than reality, it’s just an illustrative benchmark and keeps the numbers round, bear with me).

While that benchmark establishes the transit price, the effective price for bandwidth in the region is the blended price of transit ($10/Mbps) and peering ($0/Mbps). Every byte delivered over peering is a would-be transit byte that doesn't need to be paid for. While North America has some of the lowest transit pricing in the world, it also has below average rates of peering. The chart below shows the split between peering and transit in the region. While it's gotten better over the last three months, North America still lags behind every other region in the world in terms of peering..

While we peer nearly 40% of traffic globally, we only peer around 20-25% in North America. Assuming the price of transit is the benchmark $10/Mbps in North America without peering, with peering it is effectively $8/Mbps. Based only on bandwidth costs, that makes it the second least expensive region in the world to provide an Internet service like CloudFlare. So what's the least expensive?

Europe

Europe's transit pricing roughly mirrors North America's so, again, assume a benchmark of $10/Mbps. While transit is priced similarly to North America, in Europe there is a significantly higher rate of peering. CloudFlare peers 50-55% of traffic in the region, making the effective bandwidth price $5/Mbps. Because of the high rate of peering and the low transit costs, Europe is the least expensive region in the world for bandwidth.

The higher rate of peering is due in part to the organization of the region's “peering exchanges”. A peering exchange is a service where networks can pay a fee to join, and then easily exchange traffic between each other without having to run individual cables between each others' routers. Networks connect to a peering exchange, run a single cable, and then can connect to many other networks. Since using a port on a router has a cost (routers cost money, have a finite number of ports, and a port used for one network cannot be used for another), and since data centers typically charge a monthly fee for running a cable between two different customers (known as a "cross connect"), connecting to one service, using one port and one cable, and then being able to connect to many networks can be very cost effective.

The value of an exchange depends on the number of networks that are a part of it. The Amsterdam Internet Exchange (AMS-IX), Frankfurt Internet Exchange (DE-CIX), and the London Internet Exchange (LINX) are three of the largest exchanges in the world. (Note: these links point to PeeringDB.com which provides information on peering between networks. You'll need to use the username/password guest/guest in order to login.)

In Europe, and most other regions outside North America, these and other exchanges are generally run as non-profit collectives set up to benefit their member networks. In North America, while there are Internet exchanges, they are typically run by for-profit companies. The largest of these for-profit exchanges in North America are run by Equinix, a data center company, which uses exchanges in its facilities to increase the value of locating equipment there. Since they are run with a profit motive, pricing to join North American exchanges is typically higher than exchanges in the rest of the world.

CloudFlare is a member of many of Equinix's exchanges, but, overall, fewer networks connect with Equinix compared with Europe's exchanges (compare, for instance, Equinix Ashburn, which is their most popular exchange with about 400 networks connected, versus 1,200 networks connected to AMS-IX). In North America the combination of relatively cheap transit, and relatively expensive exchanges lowers the value of joining an exchange. With less networks joining exchanges, there are fewer opportunities for networks to easily peer. The corollary is that in Europe transit is also cheap but peering is very easy, making the effective price of bandwidth in the region the lowest in the world.

Asia

Asia’s peering rates are similar to Europe. Like in Europe, CloudFlare peers 50-55% of traffic in Asia. However, transit pricing is significantly more expensive. Compared with the benchmark of $10/Mbps in North America and Europe, Asia's transit pricing is approximately 7x as expensive ($70/Mbps, based on the benchmark). When peering is taken into account, however, the effective price of bandwidth in the region is $32/Mbps.

There are three primary reasons transit is so much more expensive in Asia. First, there is less competition, and a greater number of large monopoly providers. Second, the market for Internet services is less mature. And finally, if you look at a map of Asia you’ll see a lot of one thing: water. Running undersea cabling is more expensive than running fiber optic cable across land so transit pricing offsets the cost of the infrastructure to move bytes.

Latin America

Latin America is CloudFlare's newest region. When we opened our first data center in Valparaíso, Chile, we delivered 100 percent of our traffic over transit, which you can see from the graph above. To peer traffic in Latin America you need to either be in a "carrier neutral" data center — which means multiple network operators come together in a single building where they can directly plug into each other's routers — or you need to be able to reach an Internet exchange. Both are in short supply in much of Latin America.

The country with the most robust peering ecosystem is Brazil, which also happens to be the largest country and largest source of traffic in the region. You can see that as we brought our São Paulo, Brazil data center online about two months ago we increased our peering in the region significantly. We've also worked out special arrangements with ISPs in Latin America to set up facilities directly in their data centers and peer with their networks, which is what we did in Medellín, Colombia.

While today our peering ratio in Latin America is the best of anywhere in the world at approximately 60 percent, the region's transit pricing is 8x ($80/Mbps) the benchmark of North America and Europe. That means the effective bandwidth pricing in the region is $32/Mbps, or approximately the same as Asia.

Australia

Australia is the most expensive region in which we operate, but for an interesting reason. We peer with virtually every ISP in the region except one: Telstra. Telstra, which controls approximately 50% of the market, and was traditionally the monopoly telecom provider, charges some of the highest transit pricing in the world — 20x the benchmark ($200/Mbps). Given that we are able to peer approximately half of our traffic, the effective bandwidth benchmark price is $100/Mbps.

To give you some sense of how out-of-whack Australia is, at CloudFlare we pay about as much every month for bandwidth to serve all of Europe as we do to for Australia. That’s in spite of the fact that approximately 33x the number of people live in Europe (750 million) versus Australia (22 million).

If Australians wonder why Internet and many other services are more expensive in their country than anywhere else in the world they need only look to Telstra. What's interesting is that Telstra maintains their high pricing even if only delivering traffic inside the country. Given that Australia is one large land mass with relatively concentrated population centers, it's difficult to justify the pricing based on anything other than Telstra's market power. In regions like North America where there is increasing consolidation of networks, Australia's experience with Telstra provides a cautionary tale.

Conclusion

The chart above shows the relative cost of bandwidth assuming a benchmark transit cost of $10/Megabits per second (Mbps) per month (which we know is higher than actual pricing, it’s just a benchmark) in North America and Europe.

While we keep our pricing at CloudFlare straight forward, charging a flat rate regardless of where traffic is delivered around the world, actual bandwidth prices vary dramatically between regions. We’ll continue to work to decrease our transit pricing, and increasing our peering in order to offer the best possible service at the lowest possible price. In the meantime, if you’re an ISP who wants to offer better connectivity to the increasing portion of the Internet behind CloudFlare’s network, we have an open policy and are always happy to peer.

A Quick Look at NAT64 and NAT46

Introduction

In the best of worlds we would all be using native IPv6 now, or at least dual
stack. That is not the case however and IPv4 will be around for a long time yet.
During that time that both protocols exist, there will be a need to translate
between the two, like it or not.

Different Types of NAT

Before we begin, let’s define some different forms of NAT:

NAT44 – NAT from IPv4 to IPv4
NAT66 – NAT from IPv6 to IPv6
NAT46 – NAT from IPv4 to IPv6
NAT64 – NAT from IPv6 to IPv4

The most commonly used type is definitely NAT44 but here we will focus on translating
between IPv4 and IPv6.

NAT64

There are two different forms of NAT64, stateless and statefull. The stateless version
maps the IPv4 address into an IPv6 prefix. As the name implies, it keeps no state.
It does not save any IP addresses since every v4 address maps to one v6 address.
Here is a comparison of stateless and statefull NAT64:

Stateless_vs_statefull

DNS64

When resolving names to numbers in IPv4, A records are used. When doing the same
in IPv6, AAAA records are used. When using NAT64, the device doing Continue reading

Why I gave up Networking for Software

It's now been 3 months since I transitioned from Networking to Software. This is a retrospective piece on my reasons for giving up on Networking.

Introduction

You might be reading this thinking:

"another networking guy moving to software... network engineering is doomed".

If you are, stop thinking right now. There is one important thing about my story that is very different. I've been writing software for longer than I have been doing networking albeit not in a professional capacity. Software Engineering is where my passion lies right now and let me explain why...

My Reasons

1. DevOps

DevOps for Networking is still, very slowly, becoming reality. Elsewhere DevOps is very much in full swing. Tools like:

Vagrant, Packer, Puppet, Chef, SaltStack, Ansible, Fig, Docker, Jenkins/TravisCI, Dokku, Heroku, OpenShift (the list goes on)...

have redefined how I work and being in an environment where I can build things with them day to day is a dream come true for me.

I get gersburms just thinking about building Continous Integration/Continous Delivery Pipelines, Automated creation of Dev/Test environments and Configuration as Code.

2. SDN

Software-Defined Networking was the turning point in my career. It enabled me to make the switch in career paths Continue reading

Is SDN API directionality absurd?

I was finally catching up on a number of posts I'd saved to read later and noticed the prevalent use of "Northbound" and "Southbound". I'm now starting to question whether these terms are necessary or accurate.

Dictionary definition

Let's start with the Oxford English Dictionary definition of these terms.

northbound | ˈnɔːθbaʊnd | adjective travelling or leading towards the north: northbound traffic.

southbound | ˈsaʊθbaʊnd | adjective travelling or leading towards the south: southbound traffic | the southbound carriageway of the A1.

As our interfaces are static and can't travel one can assume the intent of these adjectives in our context is to indicate that the interfaces are leading in the specified direction.

On Directionality as a descriptor

Categorizing an API by directionality is rather perplexing IMHO.

Specify directionality without a reference point is misleading For example, OVSDB is a northbound API for Open vSwitch but southbound API for an SDN controller.

For SDN controllers, there are two types of interfaces:

User-Facing or Application-Facing (formerly Northbound) This API is designed to expose higher-order functions in such a way that they can easily be consumed by humans and programmers. By this logic, we can include any " API's" or language bindings Continue reading

Community Show – CCNA Data Center Part1 with Anthony Sequeira and Orhan Ergun

In this first part of CCNA Datacenter sessions , Anthony Sequeira and Orhan Ergun are talking about the topics in the blueprint. They identify all the technologies which you should know for the CCNA Datacenter exam. Topics include : DCICN exam which is the first exam. DCICT exam which is the second exam. Datacenter Fundamentals, […]

Author information

Orhan Ergun

Orhan Ergun, CCIE, CCDE, is a network architect mostly focused on service providers, data centers, virtualization and security.

He has more than 10 years in IT, and has worked on many network design and deployment projects.

In addition, Orhan is a:

Blogger at Network Computing.
Blogger and podcaster at Packet Pushers.
Manager of Google CCDE Group.
On Twitter @OrhanErgunCCDE

The post Community Show – CCNA Data Center Part1 with Anthony Sequeira and Orhan Ergun appeared first on Packet Pushers Podcast and was written by Orhan Ergun.

Community Show – CCNA Data Center Part1 with Anthony Sequeira and Orhan Ergun

[player] In this first part of CCNA Datacenter sessions , Anthony Sequeira and Orhan Ergun are talking about the topics in the blueprint. They identify all the technologies which you should know for the CCNA Datacenter exam. Topics include : DCICN exam which is the first exam. DCICT exam which is the second exam. Datacenter […]

The post Community Show – CCNA Data Center Part1 with Anthony Sequeira and Orhan Ergun appeared first on Packet Pushers.

A Quick Look at MPLS-TE

Introduction

I’m currently designing and implementing a large network which will run MPLS.
This network will replace an old network that was mainly L2 based and did not
run MPLS, only VRF lite. There are a few customers that need to have diverse
paths in the network and quick convergence when a failure occurs.
This led me to consider MPLS-TE for those customers and to have plain MPLS
through LDP for other customers buying VPNs. What is the usage for MPLS-TE?

Weaknesses of IGP

When using normal IP forwarding a least cost path is calculated through an IGP,
such as OSPF or ISIS. The problem though is that only the least cost path will
be utilized, any links not on the best path will sit idle, which is a waste of
bandwidth. IGP metrics can be manipulated but that only moves the problem to
other links, it does not solve the root cause. Manipulating metrics is cumbersome
and prone to error. It’s difficult to think of all the traffic flows in the network
and get all the metrics correct. IGPs also lack the granularity in metrics to
utilize all the bandwidth in the network.

RSVP-TE

RSVP in the past was Continue reading

Finally: a Virtual Switch Supports BPDU Guard

Nexus 1000V release 5.2(1)SV3(1.1) was published on August 22nd (I’m positive that has nothing to do with VMworld starting tomorrow) and I found this gem in the release notes:

Enabling BPDU guard causes the Cisco Nexus 1000V to detect these spurious BPDUs and shut down the virtual machine adapters (the origination BPDUs), thereby avoiding loops.

It took them almost three years, but we finally have BPDU guard on a layer-2 virtual switch (why does it matter). Nice!

Response – Do We Need To Redefine Open?

Tom Hollingsworth wrote a great post on whether or not we need to redefine "Open". My response was too long for a comment, so here it is!

Open Source vs Free Software

The first item is just a point of clarification. While the terms "Open Source" and "Free Software" are often used interchangeably there is a difference.

The two terms describe almost the same category of software, but they stand for views based on fundamentally different values. Open source is a development methodology; free software is a social movement. - Richard Stallman

You can read the full article here but the TL;DR version is that while a high percentage of Open Source software is Free Software, the definition of Open Source is less strict about guaranteeing freedoms.

...with that out of the way, let's move to "open"

On "open" and "openness"

I like the Wikipedia description of "openness":

Openness is an overarching concept or philosophy that is characterized by an emphasis on transparency and free unrestricted access to knowledge and information as well as collaborative or cooperative management and decision making rather than a central authority. - Wikipedia

It highlights some key terms which our "open" things should be adhering Continue reading

Fun with Fig (and Docker)

I first heard of Fig when I read about Docker acquiring Orchard, a container hosting service, back in July. Last week I finally got to read a little more about it and it just so happens it is the missing piece of the puzzle in a couple of projects that I am working on right now!

What does Fig do?

The best way I would describe Fig is like Vagrant for Docker containers. If you don't know what Vagrant is, or aren't using it then you are missing out!

Fig lets you bring up and tear down docker containers (single or multiple) with a simple command. To do this, you express the desired configuration in a YAML file, fig.yml.

Getting started

On OSX, you'll need to have an accessible Docker environment. The easiest way to do this is with Homebrew and boot2docker

brew install docker
brew install boot2docker
boot2docker init
boot2docker start
export DOCKER_HOST=tcp://$(boot2docker ip 2>/dev/null):2375
# Install Fig
pip install fig

If you don't have Python and/or pip installed you may want to install the fig binary

Writing a Fig file for Open vSwitch

Let's say you are doing some integration Continue reading