VMware NSX: The Good, the Bad and the Ugly

After four live sessions we finished the VMware NSX Technical Deep Dive webinar yesterday. Still have to edit the materials, but right now the whole thing is already over 6 hours long, and there are two more guest speaker sessions to come.

Anyways, in the previous sessions we covered all the good parts of NSX and a few of the bad ones. Everything that was left for yesterday were the ugly parts.

Read more ...

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently Veeraraghavan et al., OSDI’18

Here’s a really valuable paper detailing four plus years of experience dealing with datacenter outages at Facebook. Maelstrom is the system Facebook use in production to mitigate and recover from datacenter-level disasters. The high level idea is simple: drain traffic away from the failed datacenter and move it to other datacenters. Doing that safely, reliably, and repeatedly, not so simple!

Modern Internet services are composed of hundreds of interdependent systems spanning dozens of geo-distributed datacenters. At this scale, seemingly rare natural disasters, such as hurricanes blowing down power lines and flooding, occur regularly.

How regularly? Well, we’re told that Maelstrom has been in production at Facebook for over four years, and in that time has helped Facebook to recover from over 100 disasters. I make that about one disaster every two weeks!! Not all of these are total loss of a datacenter, but they’re all serious datacenter-wide incidents. One example was fibrecuts leading to a loss of 85% of the backbone network capability connecting a datacenter to the FB infrastructure. Maelstrom had most user-facing traffic drained in about 17 minutes, and the all traffic Continue reading

Masscan as a lesson in TCP/IP

When learning TCP/IP it may be helpful to look at the masscan port scanning program, because it contains its own network stack. This concept, "contains its own network stack", is so unusual that it'll help resolve some confusion you might have about networking. It'll help challenge some (incorrect) assumptions you may have developed about how networks work.
For example, here is a screenshot of running masscan to scan a single target from my laptop computer. My machine has an IP address of 10.255.28.209, but masscan runs with an address of 10.255.28.250. This works fine, with the program contacting the target computer and downloading information -- even though it has the 'wrong' IP address. That's because it isn't using the network stack of the notebook computer, and hence, not using the notebook's IP address. Instead, it has its own network stack and its own IP address.

At this point, it might be useful to describe what masscan is doing here. It's a "port scanner", a tool that connects to many computers and many ports to figure out which ones are open. In some cases, it can probe further: once it connects to Continue reading

NetQ agent on a host

We all know and love NetQ – it works hand-in-hand with Linux to accelerate data center operations. Customers love how easy it is to install and operate which makes their lives easier. Also, it can prevent and find issues in a data center by viewing the entire data center as a whole and providing three different types of services:

  • Preventative: NetQ allows an engineer to check all data center configurations and state in a few steps from any location in the network. The validation can be done on a virtual network using vagrant with Cumulus VX or if a virtual environment is not available, it can also be used during an change outage window. Since NetQ has built in analyzers of the network as a whole, no scripting is required and the validation is done from one location, rather than hop by hop. It can also shorten outage windows needed for network changes allowing shorter outage windows virtually or during outage windows.
  • Proactive: NetQ supplies notifications if something goes wrong in the network by either logging it to a file or integrating with third party applications like Slack, PagerDuty, or Splunk. It can also be filtered to ensure the right Continue reading

Watch Live – DNSSEC Workshop on October 24 at ICANN 63 in Barcelona

ICANN 63 banner image

What can we learn from recent success of the Root KSK Rollover? What is the status of DNSSEC deployment in parts of Europe – and what lessons have been learned? How can we increase the automation of the DNSSEC “chain of trust”? And what new things are people doing with DANE?

All these topics and more will be discussed at the DNSSEC Workshop at the ICANN 63 meeting in Barcelona, Spain, on Wednesday, October 24, 2018. The session will begin at 9:00 and conclude at 15:00 CEST (UTC+2).

The agenda includes:

  • DNSSEC Workshop Introduction, Program, Deployment Around the World – Counts, Counts, Counts
  • Panel: DNSSEC Activities
    • Includes presenters from these TLDs: .DK, .DE, .CH, .UK, .SE, .IT, .ES, .CZ
  • Report on the Execution of the .BR Algorithm Rollover
  • Panel: Automating Update of DS records
  • Panel: Post KSK Roll? Plan for the Next KSK Roll?
  • DANE usage and use cases
  • DNSSEC – How Can I Help?

It should be an outstanding session!  For those onsite, the workshop will be room 113.

 

  • More info and slides are available from these URLs (ICANN’s online schedule system breaks it up into sections based on breaks and lunch):

IDG Contributor Network: Self-healing SD-WAN removes the drama of high-availability planning

My humble beginnings Back in the early 2000s, I was the sole network engineer at a startup. By morning, my role included managing four floors and 22 European locations packed with different vendors and servers between three companies. In the evenings, I administered the largest enterprise streaming networking in Europe with a group of highly skilled staff.Since we were an early startup, combined roles were the norm. I’m sure that most of you who joined as young engineers in such situations could understand how I felt back then. However, it was a good experience, so I battled through it. To keep my evening’s stress-free and without any IT calls, I had to design in as much high-availability (HA) as I possibly could. After all, all the interesting technological learning was in the second part of my day working with content delivery mechanisms and complex routing.To read this article in full, please click here

IDG Contributor Network: Self-healing SD-WAN removes the drama of high-availability planning

My humble beginnings Back in the early 2000s, I was the sole network engineer at a startup. By morning, my role included managing four floors and 22 European locations packed with different vendors and servers between three companies. In the evenings, I administered the largest enterprise streaming networking in Europe with a group of highly skilled staff.Since we were an early startup, combined roles were the norm. I’m sure that most of you who joined as young engineers in such situations could understand how I felt back then. However, it was a good experience, so I battled through it. To keep my evening’s stress-free and without any IT calls, I had to design in as much high-availability (HA) as I possibly could. After all, all the interesting technological learning was in the second part of my day working with content delivery mechanisms and complex routing.To read this article in full, please click here

Cloudflare’s network boosts security and performance for IBM Cloud customers

Cloudflare’s network boosts security and performance for IBM Cloud customers

Today our partner IBM® announced the general availability of Cloud Internet Services (CIS) Enterprise. It marks a significant step forward in the partnership that we announced at the IBM THINK event in March.

CIS delivers security and performance to IBM Cloud® customers’ internet applications. It brings together Cloudflare’s 150+ points of presence with IBM Cloud’s 60 data centers, stopping attacks before they can even reach the IBM Cloud. CIS Enterprise is integrated into IBM Cloud, allowing IBM Cloud customers to set up and manage Cloudflare’s DDoS mitigation, web application firewall, smart routing and highly customizable load balancer, all from within the IBM Cloud user interface.  

Cloudflare’s network boosts security and performance for IBM Cloud customers

Our Network Map (as of 10/18/18). Click here for the latest version

We thought it timely to give a refresher on how Cloudflare’s network supports IBM Cloud customers. The network is designed to meet requirements of the most demanding enterprise customers. It is based on an architecture that differentiates it from legacy CDN, DNS and DDoS-mitigation services to ensure that internet applications stay online, even in the face of extremely high volume attacks or legitimate traffic spikes.

Cloudflare’s network of data centers, distributed across 74 countries (including 22 in China), has a network Continue reading