Remember that Slack outage earlier this month? The one that happened when we all got back from vacation and tried to jump on to share cat memes and emojis? We all chalked it up to gremlins and went on going through our pile of email until it came back up. The post-mortem came out yesterday and there were two things that were interesting to me. Both of them have implications on reliability planning and how we handle the worst-case scenarios we come up with.
The first thing that came up in the report was that the specific cause for the outage came from an AWS Transit Gateway not being able to scale fast enough to handle the demand spike that came when we all went back to work on the morning of January 4th. What, the cloud can’t scale?
The cloud is practically limitless when it comes to resources. We can create instances with massive CPU resources or storage allocations or even networking pipelines. However, we can’t create them instantly. No matter how much we need it takes time to do the basic provisioning to get it up and running. It’s the old story of Continue reading
Even with minor caveats, I seem to be in a better place with macOS 11.1 Big Sur versus macOS 10.15.7 Catalina. Big Sur is not a flawless experience for me yet, but I have hope it will become so as software makers have time to adjust to all of Apple's changes. And I'll take being able to run GNS3 labs without kernel panics as a big win.
The post Stable: GNS3 2.2.17 + VMware Fusion 12.1.0 + macOS 11.1 (Build 20C69) appeared first on Packet Pushers.
or perhaps the friday fifteen …
Defining and measuring programmer productivity is something of a great white whale in the software industry. It’s the basis of enormous investment, the value proposition of numerous startups, and one of the most difficult parts of an engineering manager or Continue reading
On 19 January 2021, I took and passed the Implementing DevOps Solutions and Practices (DEVOPS) exam on my first attempt. This is the sixth DevNet exam I’ve passed … and probably the last! Much like my experience with enterprise and service provider automation, I have years of real-life experience solving a diverse set of business problems using DevOps skills. I’ve spoken about the topic on various podcasts and professional training courses many times. Even given that experience, the exam blueprint introduced me to new technologies such as Cisco AppDynamics and Prometheus, to name a few.
I found DEVOPS to be more difficult than the product-specific concentration exams like ENAUTO, SPAUTO, and SAUTO. Because the exam has very little Cisco-specific content (AppDynamics is about the extent of it), you’ll need extensive hands-on, detail-oriented experience with many third-party products. To name a few: Ansible, Terraform, Docker, Kubernetes, Prometheus, ELK, git/GitHub, Travis CI, Jenkins, and Drone. Like most Cisco specialties, it isn’t enough just to watch video training to learn the details of these technologies; labbing and self-learning are both essential to pass this challenging exam.
Unlike DEVASC, DEVCOR, ENAUTO, and SAUTO, I did not Continue reading
Around the world government and medical organizations are struggling with one of the most difficult logistics challenges in history: equitably and efficiently distributing the COVID-19 vaccine. There are challenges around communicating who is eligible to be vaccinated, registering those who are eligible for appointments, ensuring they show up for their appointments, transporting the vaccine under the required handling conditions, ensuring that there are trained personnel to administer the vaccine, and then doing it all over again as most of the vaccines require two doses.
Cloudflare can't help with most of that problem, but there is one key part that we realized we could help facilitate: ensuring that registration websites don't crash under load when they first begin scheduling vaccine appointments. Project Fair Shot provides Cloudflare's new Waiting Room service for free for any government, municipality, hospital, pharmacy, or other organization responsible for distributing COVID-19 vaccines. It is open to eligible organizations around the world and will remain free until at least July 1, 2021 or longer if there is still more demand for appointments for the vaccine than there is supply.
The problem of vaccine scheduling registration websites crashing under load isn't theoretical: it is happening over Continue reading
Today, we are excited to announce Cloudflare Waiting Room! It will first be available to select customers through a new program called Project Fair Shot which aims to help with the problem of overwhelming demand for COVID-19 vaccinations causing appointment registration websites to fail. General availability in our Business and Enterprise plans will be added in the near future.
Most of us are familiar with the concept of a waiting room, and rarely are we excited about the idea of being in one. Usually our first experience of one is at a doctor’s office — yes, you have an appointment, but sometimes the doctor is running late (or one of the patients was). Given the doctor can only see one person at a time… the waiting room was born, as a mechanism to queue up patients.
While servers can handle more concurrent requests than a doctor can, they too can be overwhelmed. If, in a pre-COVID world, you’ve ever tried buying tickets to a popular concert or event, you’ve probably encountered a waiting room online. It limits requests inbound to an application, and places these requests into a virtual queue. Once the number Continue reading
DDoS attack trends in the final quarter of 2020 defied norms in many ways. For the first time in 2020, Cloudflare observed an increase in the number of large DDoS attacks. Specifically, the number of attacks over 500Mbps and 50K pps saw a massive uptick.
In addition, attack vectors continued to evolve, with protocol-based attacks seeing a 3-10x increase compared to the prior quarter. Attackers were also more persistent than ever — nearly 9% of all attacks observed between October and December lasted more than 24 hours.
Below are additional noteworthy observations from the fourth quarter of 2020, which the rest of this blog explores in greater detail.
After having some feedback regarding my previous post on running the JNCIE-DC self-study workbook in EVE-NG. I wanted to share some of the most common questions I personally experienced while using the lab and general things to be aware of and some tips! I also ran into some aspects of going through the workbook that […]
The post JNCIE-DC lab in EVE-NG tips and tricks first appeared on Rick Mur.Comparing the current operational state of your IT infrastructure to your desired state is a common use case for IT automation. This allows automation users to identify drift or problem scenarios to take corrective actions and even proactively identify and solve problems. This blog post will walk through the automation workflow for validation of operational state and even automatic remediation of issues.
We will demonstrate how the Red Hat supported and certified Ansible content can be used to:
The recently released ansible.utils version 1.0.0 Collection has added support for ansible.utils.cli_parse module, which converts text data into structured JSON format. The module has the capability to either execute the command on the remote endpoint and fetch the text response, or Continue reading