Post Mortem on Cloudflare Control Plane and Analytics Outage

Beginning on Thursday, November 2, 2023 at 11:43 UTC Cloudflare's control plane and analytics services experienced an outage. The control plane of Cloudflare consists primarily of the customer-facing interface for all of our services including our website and APIs. Our analytics services include logging and analytics reporting.

The incident lasted from November 2 at 11:44 UTC until November 4 at 04:25 UTC. We were able to restore most of our control plane at our disaster recovery facility as of November 2 at 17:57 UTC. Many customers would not have experienced issues with most of our products after the disaster recovery facility came online. However, other services took longer to restore and customers that used them may have seen issues until we fully resolved the incident. Our raw log services were unavailable for most customers for the duration of the incident.

Services have now been restored for all customers. Throughout the incident, Cloudflare's network and security services continued to work as expected. While there were periods where customers were unable to make changes to those services, traffic through our network was not impacted.

This post outlines the events that caused this incident, the architecture we had in place to prevent issues Continue reading

Weekend Reads 110323


With security, the battle between good and evil is always a swinging pendulum. Traditionally, the shrewdness of the attack has depended on the skill of the attacker and the sophistication of the arsenal.


While cyberattacks on websites receive much attention, there are often unaddressed risks that can lead to businesses facing lawsuits and privacy violations even in the absence of hacking incidents.


A new login technique is becoming available in 2023: the passkey. The passkey promises to solve phishing and prevent password reuse.


Security researchers have discovered what they believe may be a government attempt to covertly wiretap an instant messaging service in Germany — an attempt that was blown because the potential intercepting authorities failed to reissue a TLS certificate.


Artists suing generative artificial intelligence art generators have hit a stumbling block in a first-of-its-kind lawsuit over the uncompensated and unauthorized use of billions of images downloaded from the internet to train AI systems, with a federal judge’s dismissal of most claims.


Professional artists and photographers annoyed at generative AI firms using their work to train their technology may soon have an effective way to respond that doesn’t involve going to the courts.


Intel is shedding its silicon photonics transceiver module business as part of restructuring and cost-cutting measures, offloading it to manufacturing company Jabil.


Domain Name System (DNS) abuse stands has proven a constant in the internet threat landscape, posing risk to the overall digital trust.


SpaceX is equipping its new satellites with inter-satellite laser links (ISLLs). They now have over 8,000 optical terminals in orbit (3 per satellite) and they communicate at up to 100 Gbps.

Calico monthly roundup: October 2023

Welcome to the Calico monthly roundup: October edition! From open source news to live events, we have exciting updates to share—let’s get into it!

 

Join us at KubeCon + CloudNativeCon North America 2023

We’re gearing up for KubeCon + CloudNativeCon 2023 in Chicago. Join us at booth #G13 for exciting Kubernetes security updates and pick up some cool new Calico swag!

See what we’ve got planned.

Customer case study: eHealth

Calico provides visibility and zero-trust security controls for eHealth on Amazon EKS. Read our new case study to find out how.

Read case study.

 

Evaluating container firewalls for Kubernetes network security

Learn why a traditional firewall architecture doesn’t work for modern cloud-native applications and results in a huge resource drain in a production environment.

Read blog post

The State of Calico Open Source: Usage & Adoption Report 2023

Get insights into Calico’s adoption across container and Kubernetes environments, in terms of platforms, data planes, and policies.

Read the report.

Open source news

HN708: The Future Of Networking With Brad Casemore – Part 1

The Future of Networking series continues with Brad Casemore, who survived multiple decades in the technology sector, including sixteen years as an analyst for IDC. He's been a longtime observer of networking markets, technologies, and trends. We talk about the interest in AI and try to separate the hype from the reality, multi-cloud networking, and more.

The post HN708: The Future Of Networking With Brad Casemore – Part 1 appeared first on Packet Pushers.

Enhancing Kubernetes Networking with the Gateway API

Kubernetes, the stalwart of container orchestration, has ushered in a new era of application deployment and management. But as the Kubernetes ecosystem evolves, networking  within these clusters has posed persistent challenges. Enter the Gateway API, a transformative solution poised to redefine Kubernetes networking as we know it. At its core, the Gateway API represents a paradigm shift in Kubernetes networking. It offers a standardized approach to configuring and managing network routing, traffic shaping, and security policies within Kubernetes clusters. This standardization brings with it a host of compelling advantages. Firstly, it simplifies the intricate world of networking. By providing a declarative and consistent method to define routing rules, it liberates developers and operators from the complexities of network intricacies. This shift allows them to channel their energies toward refining application logic. The Gateway API doesn’t stop there; it brings scalability to the forefront. Traditional Kubernetes networking solutions, like Ingress controllers, often falter under the weight of burgeoning workloads. In contrast, the Gateway API is engineered to gracefully handle high loads, promising superior performance for modern, dynamic applications. NGINX, now a part of F5, is the company behind the popular open source project, NGINX. NGINX offers a suite of technologies Continue reading

Video: Hacking BGP for Fun and Profit

At least some people learn from others’ mistakes: using the concepts proven by some well-publicized BGP leaks, malicious actors quickly figured out how to hijack BGP prefixes for fun and profit.

Fortunately, those shenanigans wouldn’t spread as far today as they did in the past – according to RoVista, most of the largest networks block the prefixes Route Origin Validation (ROV) marks as invalid.

Notes:

You need at least free ipSpace.net subscription to watch videos in this webinar.

Video: Hacking BGP for Fun and Profit

At least some people learn from others’ mistakes: using the concepts proven by some well-publicized BGP leaks, malicious actors quickly figured out how to hijack BGP prefixes for fun and profit.

Fortunately, those shenanigans wouldn’t spread as far today as they did in the past – according to RoVista, most of the largest networks block the prefixes Route Origin Validation (ROV) marks as invalid.

Notes:

You need at least free ipSpace.net subscription to watch videos in this webinar.

Hedge 201: Roundtable

It’s time to gather round the hedge and discuss whatever Eyvonne, Tom, and Russ find interesting! In this episode we discuss business logic vulnerabilities, and how we often forget to think outside the box to understand the attack surfaces that matter. We also discuss upcoming network speed increases like Wi-Fi 7 and 800G Ethernet. Do we really need these speeds, or are we just getting caught up in a hype cycle?

download

2024 network plans dogged by uncertainty, diverging strategies

It’s barely fall of 2023, but it’s already clear that CIOs aren’t particularly positive about their network plans for 2024. Of 83 I have input from, in fact, 59 say they expect “significant issues” in their network planning for next year, and 71 say that they’ll be “under more pressure” in 2024 than they were this year. Sure, CIOs have a high-pressure job, but their expectations for 2024 are worse than for any year in the past 20 years, other than during Covid. Nobody is saying it’s a “the sky is falling” crisis like the proverbial Chicken Little, but some might be hunching their shoulders just a little.It seems that in 2023, all the certainties CIOs had identified in their network planning up to now are being called into question. That isn’t limited to networking, either. In fact, 82 of 83 said their cloud spending is under review, and 78 said that their data center and software plans are also in flux. In fact, CIOs said their network pressures are due more to new issues relating to the cloud, the data center, and software overall than to any network-specific challenges. Given all of this, it’s probably not surprising Continue reading

Survey: Observability tools can create more resilient, secure networks

IT leaders are investing in observability technologies that can help them gain greater visibility beyond internal networks and build more resilient environments, according to recent research from Splunk.Splunk, which Cisco announced it would acquire for $28 billion, surveyed 1,750 observability practitioners to gauge investment and deployment of observability products as well as commitment to observability projects within their IT environments. According to the vendor’s State of Observability 2023 report, 87% of respondents now employ specialists who work exclusively on observability projects.To read this article in full, please click here

How to determine RTOs and RPOs for backup and recovery

When evaluating the design of your backup systems or developing a design of a new backup and recovery system, there are arguably only two metrics that matter: how fast you can recover, and how much data you will lose when you recover. If you build your design around the agreed-upon numbers for these two metrics, and then repeatedly test that you are able to meet those metrics in a recovery, you’ll be in good shape.The problem is that few people know what these metrics are for their organization. This isn’t a matter of ignorance, though. They don’t know what they are because no one ever created the metrics in the first place. And if you don’t have agreed upon metrics (also known as service levels), every recovery will be a failure because it will be judged against the unrealistic metrics in everyone’s heads. With the exception of those who are intimately familiar with the backup and disaster recovery system, most people have no idea how long recoveries actually take.To read this article in full, please click here