Automating chaos experiments in production

Automating chaos experiments in production Basiri et al., ICSE 2019

Are you ready to take your system assurance programme to the next level? This is a fascinating paper from members of Netflix’s Resilience Engineering team describing their chaos engineering initiatives: automated controlled experiments designed to verify hypotheses about how the system should behave under gray failure conditions, and to probe for and flush out any weaknesses. The ‘controlled’ part is important here because given the scale and complexity of the environment under test, the only meaningful place to do this is in production with real users.

Maybe that sounds scary, but one of the interesting perspectives this paper brings is to make you realise that it’s really not so different from any other change you might be rolling out into production (e.g. a bug fix, configuration change, new feature, or A/B test). In all cases we need to be able to carefully monitor the impact on the system, and back out if things start going badly wrong. Moreover, just like an A/B test, we’ll be collecting metrics while the experiment is underway and performing statistical analysis at the end to interpret the results.

Netflix’s system is deployed on Continue reading

In Patagonia: A New Community Network in the Village of El Cuy

Patagonia, a region in Argentina made up of deserts, pampas, and grasslands, is known for its large areas of uninhabited territory. In the north sits the village of El Cuy, with just 400 residents. Far from the large urban centers, the people of El Cuy have adapted to the difficulties of accessing different services and technologies. The Internet is no exception, thanks to a new community network.

In several ways, the community network model represents the Internet model of networking come to life. Community networks are built and implemented by people, through collaboration – all stages of the process include the community working together. In the case of the El Cuy community network, support was also provided by the CABASE and the ENACOM.

For Christian O’Flaherty, the Internet Society’s senior development manager for Latin America and the Caribbean, Internet access has become a positive catalyst for community development. “The operation of this pilot program has motivated the residents to organize themselves into a cooperative. This step will allow inhabitants from El Cuy to have access to various fundings offered by actors such as ENACOM to increase the capacity of the Internet connection.”

Abel Martínez, a resident of El Cuy Continue reading

Becoming Broadband Ready Means Community Innovation and Collaboration

There are countless communities across North America that are hungry to see better broadband access for their residents. It’s clear to local leaders that high-quality Internet access is the bedrock of a healthy and successful community – providing job opportunities, bolstering education, transforming health care, and democratizing access to information. What isn’t always so clear is how to make it happen.

That’s why Next Century Cities teamed up with the Internet Society and Neighborly to create the Becoming Broadband Ready toolkit. This comprehensive toolkit provides local leaders with a roadmap to encourage broadband investment in their community.

While every community will choose to tackle connectivity a little differently – a small island community and a large urban center will likely have unique considerations and approaches – there are many common threads that run through successful broadband projects. Becoming Broadband Ready compiles these threads into an easy-to-use and impactful resource for any community, providing resources specific to:

  • Establish Leadership
  • Build a Community Movement
  • Identify Goals
  • Evaluate the Current Circumstance
  • Establish Policies and Procedures to Support Investment
  • Prioritize Digital Inclusion
  • Identify Legislative and Regulatory Barriers
  • Explore Connectivity Options
  • Explore Financing Options
  • Be a Clear Collaborator
  • Measure Success

Next Century Cities identified the Continue reading

Datanauts 168: Why Design Process Matters For Data Centers And The Cloud

When you're tasked with a new infrastructure project on premises or in the cloud, a design process will significantly improve your chances of success. Guest Adam Post joins the Datanauts podcast to discuss a proper design process, examine frameworks for virtualized and cloud environments, and more.

The post Datanauts 168: Why Design Process Matters For Data Centers And The Cloud appeared first on Packet Pushers.

Campus design feature set-up : Part 5

In this blog series, we’ve been on a journey of sorts. We’ve shown you all the different ways to set up the CL 3.7.5 campus feature: Multi-Domain Authentication in this 6-part series and guess what? We’re getting into the home stretch!

In blogs 1-4 we had guides for Wired 802.1x using Aruba ClearPass, Wired MAC Authentication using Aruba ClearPass, Multi-Domain Authentication using Aruba ClearPass and Wired 802.1x using Cisco ISE. After this blog, we’ll just have one more covering. Multi-Domain Authentication using Cisco ISE. But we’re not here to talk about those now.

In this fifth guide, I’ll be sharing how to enable Wired MAC Authentication in Cumulus Linux 3.7.5+ using Cisco ISE (Identity Services Engine) 2.4, Patch 8.

Keep in mind that this step-by-step guide assumes that you have already performed an initial setup of Cisco ISE .

Cisco ISE Configuration:

1. Add a Cumulus Switch group to Cisco ISE:

First, we are going to add a Network Device Group to Cisco ISE:

Administration > Network Resources > Network Device Groups. Click the “+Add” button

Make sure to set the “Parent Group” to “All Device Types.” The result will look Continue reading

Internet Society Asia-Pacific Policy Survey 2019 Now Open: Consolidation in the Internet Economy

The Internet Society recently embarked on a year-long effort to explore the trends of consolidation in the Internet economy, and I write to sincerely invite you to share your views with us in the Regional Policy Survey 2019, an annual exercise of the Asia-Pacific Bureau of the Internet Society.

Your input is very important to us. It will help us understand the issue from your perspectives and produce a report to be released later this year. Ultimately, your input will help us come up with technical and policy recommendations for policymakers with the aim of preserving the Internet’s properties that give us the critical abilities to connect, speak, innovate, share, choose, and trust.

Please take 5-10 minutes to complete the survey, which covers all Internet users in Asia-Pacific. To show our appreciation, we will be offering 2 tablet computers in a lucky draw, and the winners will be notified by email after the survey closes on July 31.

Read about the previous installments of the survey.

Thank you again for your time and input.

The post Internet Society Asia-Pacific Policy Survey 2019 Now Open: Consolidation in the Internet Economy appeared first on Internet Society.

An eco-friendly internet of disposable things is coming

Get ready for a future of disposable of internet of things (IoT) devices, one that will mean everything is connected to networks. It will be particularly useful in logistics, being used in single-use plastics in retail packaging and throw-away shippers’ carboard boxes.How it will happen? The answer is when non-hazardous, disposable bio-batteries make it possible. And that moment might be approaching. Researchers say they’re closer to commercializing a bacteria-powered miniature battery that they say will propel the IoDT.[ Learn more: Download a PDF bundle of five essential articles about IoT in the enterprise ] The “internet of disposable things is a new paradigm for the rapid evolution of wireless sensor networks,” says Seokheun Choi, an associate professor at Binghamton University, in an article on the school’s website.To read this article in full, please click here

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables

One SQL to rule them all: an efficient and syntactically idiomatic approach to management of streams and tables Begoli et al., SIGMOD’19

In data processing it seems, all roads eventually lead back to SQL! Today’s paper choice is authored by a collection of experts from the Apache Beam, Apache Calcite, and Apache Flink projects, outlining their experiences building SQL interfaces for streaming. The net result is a set of proposed extensions to the SQL standard itself, being worked on under the auspices of the international SQL standardization body.

The thesis of this paper, supported by experience developing large open-source frameworks supporting real-world streaming use cases, is that the SQL language and relational model as-is and with minor non-intrusive extensions, can be very effective for manipulation of streaming data.

Many of the ideas presented here are already implemented by Apache Beam, Calcite, and Flink in some form, as one option amongst several. The streaming SQL interface has been adopted by Alibaba, Hauwei, Lyft, Uber and others, with the following feedback presented to the authors as to why they made this choice:

  • Development and adoption costs are significantly lower compared to non-declarative stream processing APIs
  • Familiarity with standard SQL eases adoption Continue reading

Intro Guide to Dockerfile Best Practices

There are over one million Dockerfiles on GitHub today, but not all Dockerfiles are created equally. Efficiency is critical, and this blog series will cover five areas for Dockerfile best practices to help you write better Dockerfiles: incremental build time, image size, maintainability, security and repeatability. If you’re just beginning with Docker, this first blog post is for you! The next posts in the series will be more advanced.

Important note: the tips below follow the journey of ever-improving Dockerfiles for an example Java project based on Maven. The last Dockerfile is thus the recommended Dockerfile, while all intermediate ones are there only to illustrate specific best practices.

Incremental build time

In a development cycle, when building a Docker image, making code changes, then rebuilding, it is important to leverage caching. Caching helps to avoid running build steps again when they don’t need to.

Tip #1: Order matters for caching

However, the order of the build steps (Dockerfile instructions) matters, because when a step’s cache is invalidated by changing files or modifying lines in the Dockerfile, subsequent steps of their cache will break. Order your steps from least to most frequently changing steps to optimize caching.

Tip #2: Continue reading

Understanding RTs and RDs

One of the items that continues to come up in my conversations with folks learning about about MPLS VPNs is defining what a Route Target (RT) and Route Distinguisher (RD) are. More specifically, most seem to understand their purpose – but often times they don’t quite understand the application. I (and many others – just google “Understanding RDs and RTs”) have written about this in the past but Im hoping to put a finer point on the topic in this post.

If someone were to ask me to summarize what route targets and route distinguishers were – I’d probably define them like this…

Route Distinguishers – serve to make routes unique
Route Targets – metadata used to make route import decisions

Now – I’ll grant that those definitions are awfully terse, but I also feel like this is a topic that is often over complicated. So let’s spend some time talking about RTs and RDs separately and then bring it all together in a lab so you can see what’s really happening.

Route Distinguishers

As I said, a route distinguisher serve to make routes look unique. So why do we care about making routes look unique? I’d argue one of Continue reading

Cloudflare outage caused by bad software deploy (updated)

This is a short placeholder blog and will be replaced with a full post-mortem and disclosure of what happened today.

For about 30 minutes today, visitors to Cloudflare sites received 502 errors caused by a massive spike in CPU utilization on our network. This CPU spike was caused by a bad software deploy that was rolled back. Once rolled back the service returned to normal operation and all domains using Cloudflare returned to normal traffic levels.

This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred. Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.


Update at 2009 UTC:

Starting at 1342 UTC today we experienced a global outage across our network that resulted in visitors to Cloudflare-proxied domains being shown 502 errors (“Bad Gateway”). The cause of this outage was deployment of a single misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment of new Cloudflare WAF Managed rules.

The intent of these new rules was to improve the blocking of inline JavaScript that is used in attacks. These rules were Continue reading