Every 4 years since its start back in 1989, a hacker/security conference takes place in the Netherlands. This summer, the eighth version of this conference, called Still Hacking Anyway 2017 (sha2017.org), will run between the 4th and 8th of August. The conference is not-for-profit and run by volunteers, and this year we’re expecting about 4000 visitors.
For an event like SHA, all the visitors need to connect to a network to access the Internet. A large part of the network is built on Cumulus Linux. In this article, we’ll dive into what the event is and how the network, with equipment sponsored by Cumulus, is being built.
What makes SHA 2017 especially exciting is that it is an outdoor event. All the talks are held in large tents, and they can be watched online through live streams. At the event site, visitors will organize “villages” (a group of tents) where they will work on several projects ranging from security research to developing electronics and building 3D printers.
Attendees will camp on a 40 acre field, but they won’t be off the grid, as wired and wireless networks will keep them connected. The network is designed Continue reading
Network optimization is an incredibly important component to scalability and efficiency. Without solid network optimization, an organization will be confronted with a quickly building overhead and vastly reduced efficiency. Network optimization aids a business in making the most of its technology, reducing costs and even improving upon security. Through virtualization, businesses can leverage their technology more effectively — they just need to follow a few virtual networking best practices.
There are certainly applications that are optional, but there are others that are critical. The most important applications on a network are the ones that need to be prioritized in terms of system resources. These are generally cyber security suites, firewalls, and monitoring services. Optional applications may still be preferred for business operations, but because they aren’t critical they can still operate slowly in the event of system wide issues.
Prioritizing security applications is especially important as there are many cyber security exploits that operate with the express purpose of flooding the system until security elements fail. When security apps are prioritized, the risk of this type of exploit is greatly reduced.
Application monitoring services will be able to automatically detect when Continue reading
There are a lot of reasons you may be thinking about moving to a private cloud environment. Perhaps you need more security, or maybe you feel the risks of public cloud have outweighed the benefits. But you’re still not certain that this version of web-scale networking is right for your company, and you’re wondering what’s involved in moving from a public cloud to a private one. Not surprisingly, there are several factors to consider when making the move from public to private clouds. Public clouds have their place, but there are many good reasons to switch. In this post we’ll covers some private cloud tips and considerations.
For an even deeper look at reasons you may want to switch to a private cloud, check out our education page, Private Cloud vs. Public Cloud.
Private clouds take several different forms: semi-private cloud, virtual private cloud (hybrid), and fully private cloud. Each one has their advantages and disadvantages.
Semi-private clouds are similar to public clouds where the cloud is being hosted by a provider, but the access to the cloud is through private channels and not over the Internet. This reduces the problem of lag Continue reading
To help you stay in the know on all things data center networking, we’ve gathered some of our favorite content from both our own publishing house and from around the web. We hope this helps you stay up to date on both your own skills and on data center networking trends. If you have any suggestions for next time, let us know in the comment section!
BGP in the data center: Are you leveraging everything BGP has to offer? Probably not. This practical report peels away the mystique of BGP to reveal an elegant and mature, simple yet sophisticated protocol. Author Dinesh Dutt, Chief Scientist at Cumulus Networks, covers BGP operations as well as enhancements that greatly simplify its use so that practitioners can refer to this report as an operational manual. Download the guide.
Magic Quadrant report: Cumulus Networks has been named a “Visionary” in the Data Center Networking category for 2017 Gartner Magic Quadrant. With 96% of their survey respondents finding open networking to be a relevant buying criterion and with the adoption of white-box switching to reach 22% by 2020, it’s clear that disaggregation is the answer for forward-looking companies. Continue reading
We’re both honored and thrilled to announce that Cumulus Networks has been recognized as a “Visionary” in the Gartner Magic Quadrant for Data Center Networking. You can download this highly-anticipated report here, and learn about other major trends in the industry.
So, what’s it mean to be a visionary? According to Gartner, “Visionaries have demonstrated an ability to increase the features in their offerings to provide a unique and differentiated approach to the market. A visionary has innovated in one or more of the key areas of data center infrastructure, such as management (including virtualization), security (including policy enforcement), SDN and operational efficiency, and cost reductions.”
We couldn’t be happier to be recognized, and to us, it means our company vision has paid off. We’ve created a culture of visionaries through inquisitive, innovative and bold leadership, and these same traits are seen in both our philosophy and our technology. As more and more organizations embrace web-scale IT, we expect to keep pushing the technology forward — always striving for a better network.
With 96% of Gartner’s survey respondents finding open networking to be a relevant buying criterion, and with the adoption of white-box switching to reach 22% by 2020, it’s Continue reading
I started coveting IP encapsulated network virtualization back in 2005 when I was working to build a huge IP fabric. However, we needed to have layer 2 (L2) adjacencies to some servers for classic DSR load balancing. The ideal solution was to have something that looked like a bridge as far as the load balancers and servers were concerned, yet would tunnel unmodified L2 frames through the IP fabric. Alas, we were way ahead of our time.
Thank the IT gods that things have changed quite a bit in the last 12 years. Today, we as an IT community have VXLAN, which is embodied in most modern networking silicon and (a bit more importantly) realized as part of the Linux networking model so that it’s really straightforward to deploy and scale. IT geeks have a bunch of ways to build L2 domains that are extended across IP fabrics using VXLAN. There are dedicated SDN controllers, such as Contrail, Nuage, Midonet and VMware NSX; there are orchestration-hosted controllers in OpenStack Neutron and Docker Swarm; and there are simple tools like the lightweight network virtualization that we built at Cumulus Networks.
This all leads me to EVPN. We recently made EVPN available Continue reading
Back in April, we talked about a feature called Explicit Congestion Notification (ECN). We discussed how ECN is an end-to-end method used to converge networks and save money. Priority flow control (PFC) is a different way to accomplish the same goal. Since PFC supports lossless or near lossless Ethernet, you can run applications, like RDMA, over Converged Ethernet (RoCE or RoCEv2) over your current data center infrastructure. Since RoCE runs directly over Ethernet, a different method than ECN must be used to control congestion. In this post, we’ll concentrate on the Layer 2 solution for RoCE — PFC, and how it can help you optimize your network.
Certain data center applications can tolerate only little or no loss. However, traditional Ethernet is connectionless and allows traffic loss; it relies on the upper layer protocols to re-send or provide flow control when necessary. To allow flow control for Ethernet frames, 802.3X was developed to provide flow control on the Ethernet layer. 802.3X defines a standard to send an Ethernet PAUSE frame upstream when congestion is experienced, telling the sender to “stop sending” for a few moments. The PAUSE frame stops traffic BEFORE the buffer Continue reading
A few weeks ago, we attended the OpenStack Summit where we had a wonderful time connecting with customers, partners and several new faces. With the excitement of the event still lingering, we thought this was a great time to highlight how OpenStack and Cumulus Linux offer a unique, seamless solution for building a private cloud. But first, here are a few highlights from the conference.
Today is a big day for us over here at Cumulus Networks! We are pleased to announce the launch of a brand new product designed to bring you unparalleled network visibility & remediation. The newest addition to the Cumulus Networks portfolio, NetQ, is a telemetry-based fabric validation system that ensures the network is behaving as it was intended to. It allows you to test, validate and troubleshoot using advanced fabric-wide telemetry and Cumulus Linux.
To respond to the evolving industry, increasing business demands and growth, many companies have started the web-scale journey by deploying a fully programmable fabric with fully automated configurations across an open network infrastructure. Companies that have implemented some of these best practices are quickly seeing the benefits of agility, efficiency and lowered costs.
However, these organizations are also facing some unknowns: They are worried about making ad-hoc changes that disrupt the network and they can’t easily demonstrate “network correctness.” They’re interested in moving towards intent-based networking methods, but don’t have the right technology in place to do so.
Traditional operations tools and workflows weren’t built for the speed and scale that a modern cloud data center needs as they are manual, reactive and Continue reading
A couple weeks ago, our co-founder, JR Rivers, sat down with the guys at Packet Pushers, in order to discuss how to build a better network with web-scale networking. We were so excited to be featured, that we decided to use this opportunity to launch a giveaway!
The podcast goes into detail covering the benefits of web-scale networking, but we want to hear your thoughts on it. We’ve put together a quick survey to hear what you think of web-scale principles and how you may have incorporated them into your organization. Simply fill out the survey to enter for a chance to win a free Apple Watch!
The podcast covers:
Sound interesting? Grab your headphones and take a listen! You can hear the podcast now by visiting PacketPushers.com
And don’t forget to enter to win an Apple Watch!
The post Cumulus co-founder featured on Packet Pushers! appeared first on Cumulus Networks Blog.
Back in November, Cumulus Networks unveiled NCLU, an interactive command-line interface for configuring switches running Cumulus Linux. NCLU was made to help networking experts drive Linux without having to learn its intricacies and quirks, and so far, it has been very successful. Network engineers are comfortable configuring devices interactively, so NCLU helps abstract the file-based nature of Linux to smooth out the learning curve.
Since I started working at Cumulus Networks over two years ago, I’ve noticed that most of our customers who are working with us for the first time fall neatly into two categories. The majority of our users are experienced network engineers with very little Linux knowledge, whereas a minority are Linux server power-users who may only know the basics of networking. Most of my colleagues at Cumulus are networking industry veterans who started off in the first category, while I fell into the latter. I’ve always been an automation-first developer who applies web-scale principles to everything I do, meaning that from the first day I started configuring Cumulus Linux, I was doing so with tools like Ansible. With the release of Ansible 2.3, I’m happy to report that Ansible now supports NCLU out of Continue reading
In the previous two posts we discussed gathering metrics for long term trend analysis and then combining it with event-based alerts for actionable results. In order to combine these two elements, we need strong network monitoring tooling that allows us to overlay these activities into an effective solution.
The legacy approach to monitoring is to deploy a monitoring server that periodically polls your network devices via Simple Network Management Protocol. SNMP is a very old protocol, originally developed in 1988. While some things do get better with age, computer protocols are rarely one of them. SNMP has been showing its age in many ways.
Inflexibility
SNMP uses data structures called MIBs to exchange information. These MIBs are often proprietary, and difficult to modify and extend to cover new and interesting metrics.
Polling vs event driven
Polling doesn’t offer enough granularity to catch all events. For instance, even if you check disk utilization once every five minutes, you may go over threshold and back in between intervals and never know.
An inefficient protocol
SNMP’s polling design is a “call and response” protocol, this means the monitoring server will Continue reading
We’re excited to announce the release of Cumulus Linux 3.3! This product update was designed to enhance the performance and usability of your network. This release includes several enhancements to many existing features, like NCLU and EVPN, as well as brand new features, like buffer monitoring, PIM-SSM, 25G support and more, for increased reliability. The following paragraphs cover the key updates in this release.
Proactively detect congestion events that result in latency and jitter by monitoring traffic patterns to identify bottlenecks early and effectively plan for capacity. This new feature available on Mellanox hardware alerts for congestion & latency thresholds, helping you understand traffic patterns and model network operations based on buffer utilization data.
Ideal for customers using latency Sensitive Apps such as for HFT, HPC, distributed in-memory apps, buffer monitoring will allow you to proactively detect congestion events that result in latency and jitter by monitoring traffic patterns to identify bottlenecks early and effective plan for capacity.
Get Source-Specific Multicast for more efficient multicast traffic segmentation and higher scalability.
Having intermittent sources is a common issue when market data applications are used as servers to send data to a multicast group and then go silent Continue reading
Network monitoring without alerting is like having a clock without any hands. In the previous post, Eric discussed setting up a monitoring strategy, and in it we scraped the surface of network alerting. In this post we dive into alerting more deeply.
Alerting comes in many forms. In the previous post, we discussed how metrics can be set with thresholds to create alerts. This is the most basic level of alerting. CPU alerts are set at 90% of utilization. Disk usage alerts are set to 95% of utilization. There are at least two drawbacks with this level of alerting.
First, by alerting on metric thresholds, we limit ourselves to the granularity of the metrics. Consider a scenario where interface statistics are gathered every five minutes. That limits the ability to capture anomalous traffic patterns to a five minute interval, and at the fast pace of modern datacenters, that level of granularity isn’t acceptable. Limiting the alerting ability based on the thresholds.
Secondly, there are many times when alerts from certain metrics don’t create any actionable activities. For example, an alert on CPU utilization may not directly have an impact on traffic. Since switch CPUs should Continue reading
One of the least loved areas of any data center network is monitoring. This is ironic because at its core, the network has two goals: 1) Get packets from A to B 2) Make sure packets got from A to B. It is not uncommon in the deployments I’ve seen for the monitoring budget to be effectively $0, and generally, an organization’s budget also reflects their priorities. Despite spending thousands, or even hundreds of thousands, of dollars on networking equipment to facilitate goal #1 from above, there is often little money, thought and time spent in pursuit of Goal #2. In the next several paragraphs I’ll go into some basic data center network monitoring best practices that will work with any budget.
It is not hard to see why monitoring the data center network can be a daunting task. Monitoring your network, just like designing your network, takes a conscious plan of action. Tooling in the monitoring space today is highly fragmented with over 100+ “best of breed” tools that each accommodate a specific use case. Just evaluating all the tools would be a full time job. A recent Big Panda Report and their video overview of it (38 mins) Continue reading
So far in the previous articles, we’ve covered the initial objections to LACP a deep dive on the effect on traffic patterns in an MLAG environment without LACP/Static-LAG. In this article we’ll explore how LACP differs from all other available teaming techniques and then also show how it could’ve solved a problem in this particular deployment.
I originally set out to write this as a single article, but to explain the nuances it quickly spiraled beyond that. So I decided to split it up into a few parts.
• Part1: Design choices – Which NIC teaming mode to select
• Part2: How MLAG interacts with the host
• Part3: “Ships in the night” – Sharing state between host and upstream network
An important element to consider is LACP is the only uplink protocol supported by VMware that directly exchanges any network state information between the host and its upstream switches. An ESXi host is also sortof a host, but also sortof a network switch (in so far as it does forward packets locally and makes path decisions for north/south traffic); here in lies the problem, we effectively have network devices forwarding packets between each other, but Continue reading
In part1, we discussed some of the design decisions around uplink modes for VMware and a customer scenario I was working through recently. In this post, we’ll explore multi-chassis link aggregation (MLAG) in some detail and how active-active network fabrics challenge some of the assumptions made.
Disclaimer: What I’m going to describe is based on network switches running Cumulus Linux and specifically some down-in-the-weeds details on this particular MLAG implementation. That said, most of the concepts apply to similar network technologies (VPC, other MLAG implementations, stacking, virtual-chassis, etc.) as they operate in very similar ways. But YMMV.
I originally set out to write this as a single article, but to explain the nuances it quickly spiraled beyond that. So I decided to split it up into a few parts.
• Part1: Design choices – Which NIC teaming mode to select
• Part2: How MLAG interacts with the host (This page)
Part3: “Ships in the night” – Sharing state between host and upstream network
If the host is connected to two redundant switches (which these days is all but assumed), then MLAG (and equivalent solutions) is a commonly deployed option. In simple terms, Continue reading
Recently I’ve been helping a customer who’s working on a VMware cloud design. As is often the case, there are a set of consulting SME’s helping with the various areas; an NSX/virtualization consultant, the client’s tech team and a network guy (lucky me).
One of the interesting challenges in such a case is understanding the background behind design decisions that the other teams have made and the flow-on effects they have on other components. In my case, I have a decent background in designing a VMware cloud and networking, so I was able to help bridge the gap a little.
My pet peeve in a lot of cases is the common answer of “because it’s ‘best-practice’ from vendor X” and a blank stare when asked: “sure, but why?”. In this particular case, I was lucky enough to have a pretty savvy customer, so a healthy debate ensued. This is that story.
Disclaimer: What I’m going to describe is based on network switches running Cumulus Linux and specifically some down-in-the-weeds details on this particular MLAG implementation. That said, most of the concepts apply to similar network technologies (VPC, other MLAG implementations, stacking, virtual-chassis, etc.) as they operate in very Continue reading
Every now and again, we like to highlight a piece of technology or solution featured in Cumulus Linux that we find especially useful. Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) are exactly such things. In short, these technologies allow you to converge networks and save money. By supporting lossless or near lossless Ethernet, you can now run applications such as RDMA over Converged Ethernet (RoCE) or RoCEv2 over your current data center infrastructure. In this post, we’ll concentrate on the end-to-end solution for RoCEv2 – ECN and how it can help you optimize your network. We will cover PFC in a future post.
ECN is a mechanism supported by Cumulus Linux that helps provide end-to-end lossless communication between two endpoints over an IP routed network. Normally, protocols like TCP use dropped packets to indicate congestion, which then tells the sender to “slow down’. Explicit congestion notification uses this same concept, but instead of dropping packets after the queues are completely full, it notifies the receiving host that there was some congestion before the queues are completely full, thereby avoiding dropping traffic. It uses the IP layer (ECN bits in the IP TOS header) Continue reading
It’s our pleasure to start the week off announcing the release of FRRouting to the open community. We worked closely with several other partners to make this launch happen and we’ll be integrating it with our products in upcoming releases. It’s a constant priority of ours to ensure we are contributing, maintaining and inspiring contributions to the community, and this release truly provides a solution that will be welcomed from many industries.
The following post was originally published on the Linux Foundation’s blog. They have graciously given us permission to republish, as the post does a fantastic job of describing the release. We’ve added a few sentences at the end to tie it all together. We hope you enjoy.
One of the most exciting parts of being in this industry over the past couple of decades has been witnessing the transformative impact that open source software has had on IT in general and specifically on networking. Contributions to various open source projects have fundamentally helped bring the reliability and economics of web-scale IT to organizations of all sizes. I am happy to report the community has taken yet another step forward with FRRouting.
FRRouting (FRR) is an IP routing protocol suite Continue reading