Observability is a staple of high-performing software and DevOps teams. Research shows that a comprehensive observability solution, along with a number of other technical practices, positively contributes to continuous delivery and service uptime.
Observability is sometimes confused with monitoring, but there is a clear difference between the two; it’s important to understand the distinction. Observability refers to a technical solution that enables teams to actively debug a system. It is based on exploring activities, properties, and patterns that are not defined in advance. Monitoring, in contrast, is a technical solution that enables teams to watch and understand the state of their systems and is based on gathering pre-defined sets of metrics or logs.
Conventional observability and monitoring tools were designed for monolithic systems, observing the health and behavior of a single application instance. Complex distributed microservices architectures, like Kubernetes, are constantly changing, with hundreds and even thousands of pods being created and destroyed within minutes. Because this environment is so dynamic, pre-defined metrics and logs aren’t effective for troubleshooting issues. Conventional observability approaches, which work well in traditional, monolithic environments, are inadequate for Kubernetes. So an observability solution that is purpose-built for a distributed microservices Continue reading
first, a few interesting stories on the facebook outage
and other stories, as usual
I wanted to wait to put out a hot take on the Facebook issues from earlier this week because failures of this magnitude always have details that come out well after the actual excitement is done. A company like Facebook isn’t going to do the kind of in-depth post-mortem that we might like to see but the amount of information coming out from other areas does point to some interesting circumstances causing this situation.
Let me start off the whole thing by reiterating something important: Your network looks absolutely nothing like Facebook. The scale of what goes on there is unimaginable to the normal person. The average person has no conception of what one billion looks like. Likewise, the scale of the networking that goes on at Facebook is beyond the ken of most networking professionals. I’m not saying this to make your network feel inferior. More that I’m trying to help you understand that your network operations resemble those at Facebook in the same way that a model airplane resembles a space shuttle. They’re alike on the surface only.
Facebook has unique challenges that they have to face in their own way. Network automation there isn’t a bonus. It’s Continue reading
It's been a few days now since Facebook, Instagram, and WhatsApp went AWOL and experienced one of the most extended and rough downtime periods in their existence.
When that happened, we reported our bird's-eye view of the event and posted the blog Understanding How Facebook Disappeared from the Internet where we tried to explain what we saw and how DNS and BGP, two of the technologies at the center of the outage, played a role in the event.
In the meantime, more information has surfaced, and Facebook has published a blog post giving more details of what happened internally.
As we said before, these events are a gentle reminder that the Internet is a vast network of networks, and we, as industry players and end-users, are part of it and should work together.
In the aftermath of an event of this size, we don't waste much time debating how peers handled the situation. We do, however, ask ourselves the more important questions: "How did this affect us?" and "What if this had happened to us?" Asking and answering these questions whenever something like this happens is a great and healthy exercise that helps us improve our own resilience.
As the holiday shopping season fast approaches, the IPv6 Buzz crew dive into IPv6 in relation to e-commerce.
The post IPv6 Buzz 086: E-Commerce And Holiday Shopping With IPv6! appeared first on Packet Pushers.
On September 29, 2021, the Apache Security team was alerted to a path traversal vulnerability being actively exploited (zero-day) against Apache HTTP Server version 2.4.49. The vulnerability, in some instances, can allow an attacker to fully compromise the web server via remote code execution (RCE) or at the very least access sensitive files. CVE number 2021-41773 has been assigned to this issue. Both Linux and Windows based servers are vulnerable.
An initial patch was made available on October 4 with an update to 2.4.50, however, this was found to be insufficient resulting in an additional patch bumping the version number to 2.4.51 on October 7th (CVE-2021-42013).
Customers using Apache HTTP Server versions 2.4.49 and 2.4.50 should immediately update to version 2.4.51 to mitigate the vulnerability. Details on how to update can be found on the official Apache HTTP Server project site.
Any Cloudflare customer with the setting normalize URLs to origin turned on have always been protected against this vulnerability.
Additionally, customers who have access to the Cloudflare Web Application Firewall (WAF), receive additional protection by turning on the rule with the following IDs:
1c3d3022129c48e9bb52e953fe8ceb2f
(for Continue readingAfter explaining the basics of (network) names, addresses and routes, I wasted a few minutes of everyone’s time discussing the theoretical aspects of layered addressing, and then got back to practical issues like address scopes, namespaces, and address provisioning.
The video ends with a simple (and unappreciated) truth: if you have a point-to-point link between two nodes you don’t need data-link-layer addresses. The consequences of that fact are left as an exercise for the viewer (or you can wait till the next video ;)
After explaining the basics of (network) names, addresses and routes, I wasted a few minutes of everyone’s time discussing the theoretical aspects of layered addressing, and then got back to practical issues like address scopes, namespaces, and address provisioning.
The video ends with a simple (and unappreciated) truth: if you have a point-to-point link between two nodes you don’t need data-link-layer addresses. The consequences of that fact are left as an exercise for the viewer (or you can wait till the next video ;)
Since the founding of the Internet, online copyright infringement has been a real concern for policy makers, copyright holders, and service providers, and there have been considerable efforts to find effective ways to combat it. Many of the most significant legal questions around what is called “intermediary liability” — the extent to which different links in the chain of an Internet transmission can be held liable for problematic online content — have been pressed on lawmakers and regulators, and played out in courts around issues of copyright.
Although section 230 of the Communications Decency Act in the United States provides important protections from liability for intermediaries, copyright and other intellectual property claims are one of the very few areas carved out of that immunity.
Over the years, copyright holders have sometimes sought to hold Cloudflare liable for infringing content on websites using our services. This never made much sense to us. We don’t host the content of the websites at issue, we don’t aggregate or promote the content or in any way help end users find it, and our services are not even necessary for the content’s availability online. Infrastructure service providers like Cloudflare are Continue reading
This document pulls together links to a number of articles that describe how you can quickly try out DDoS Protect and get it running in your environment:
Today, we are announcing the general availability of Cloudflare Waiting Room to customers on our Enterprise plans, making it easier than ever to protect your website against traffic spikes. We are also excited to present several new features that have user experience in mind — an alternative queueing method and support for custom web/mobile applications.
Whether you’ve waited to check out at a supermarket or stood in line at a bank, you’ve undoubtedly experienced FIFO queueing. FIFO stands for First-In-First-Out, which simply means that people are seen in the order they arrive — i.e., those who arrive first are processed before those who arrive later.
When Waiting Room was introduced earlier this year, it was first deployed to protect COVID-19 vaccine distributors from overwhelming demand — a service we offer free of charge under Project Fair Shot. At the time, FIFO queueing was the natural option due to its wide acceptance in day-to-day life and accurate estimated wait times. One problem with FIFO is that users who arrive later could see long estimated wait times and decide to abandon the website.
We take customer feedback seriously and improve products based on it. A frequent request Continue reading
I’m sitting with Jeff Doyle and Jeff Tantsura to talk about network complexity on the Between 0x2 Nerds podcast today at 1PM ET today. The link is here—
Join us if you can.
Today, Pluribus Networks announced a funding round of $20 million led by Morgan Stanley Expansion Capital.
The post From the Desk of the CEO: Pluribus Raises $20M from Morgan Stanley to Fuel Growth appeared first on Pluribus Networks.