Stateful Firewall Cluster High Availability Theater

Dmitry Perets wrote an excellent description of how typical firewall cluster solutions implement control-plane high availability, in particular, the routing protocol Graceful Restart feature (slightly edited):


Most of the HA clustering solutions for stateful firewalls that I know implement a single-brain model, where the entire cluster is seen by the outside network as a single node. The node that is currently primary runs the control plane (hence, I call it single-brain). Sessions and the forwarding plane are synchronized between the nodes.

1000BASE-T Part 4 – Link Down Detection

In the previous three parts, we learned about all the interesting things that go on in the PHY with scrambling, descrambling, synchronization, auto negotiation, FEC encoding, and so on. This is all essential knowledge that we need to have to understand how the PHY can detect that a link has gone down, or is performing so badly that it doesn’t make sense to keep the link up.

What Does IEEE 802.3 1000BASE-T Say?

The function in 1000BASE-T that is responsible for monitoring the status of the link is called link monitor and is defined in 40.4.2.5. The standard does not define much on what goes on in link monitor, though. Below is an excerpt from the standard:

Link Monitor determines the status of the underlying receive channel and communicates it via the variable
link_status. Failure of the underlying receive channel typically causes the PMA’s clients to suspend normal
operation.
The Link Monitor function shall comply with the state diagram of Figure 40–17.

The state diagram (redrawn by me) is shown below:

While 1000BASE-T leaves what the PHY monitors in link monitor to the implementer, there are still some interesting variables and timers that you should be Continue reading

HS069: Regulating AI

In today’s episode Greg and Johna spar over how, when, and why to regulate AI. Does early regulation lead to bad regulation? Does late regulation lead to a situation beyond democratic control? Comparing nascent regulation efforts in the EU, UK, and US, they analyze socio-legal principles like privacy and distributed liability. Most importantly, Johna drives... Read more »

Google Joins The Homegrown Arm Server CPU Club

If you are wondering why Intel chief executive officer Pat Gelsinger has been working so hard to get the company’s foundry business not only back on track but utterly transformed into a merchant foundry that, by 2030 or so can take away some business from archrival Taiwan Semiconductor Manufacturing Co, the reason is simple.

Google Joins The Homegrown Arm Server CPU Club was written by Timothy Prickett Morgan at The Next Platform.

Prevent Data Exfiltration in Kubernetes: The Critical Role of Egress Access Controls

Data exfiltration and ransomware attacks in cloud-native applications are evolving cyber threats that pose significant risks to organizations, leading to substantial financial losses, reputational damage, and operational disruptions. As Kubernetes adoption grows for running containerized applications, it becomes imperative to address the unique security challenges it presents. This article explores the economic impact of data exfiltration and ransomware attacks, their modus operandi in Kubernetes environments, and effective strategies to secure egress traffic. We will delve into the implementation of DNS policies and networksets, their role in simplifying egress control enforcement, and the importance of monitoring and alerting for suspicious egress activity. By adopting these measures, organizations can strengthen their containerized application’s security posture running in Kubernetes and mitigate the risks associated with these prevalent cyber threats.

Economic impact of data exfiltration and ransomware attacks

Data exfiltration and ransomware attacks have emerged as formidable threats to organizations worldwide, causing substantial financial losses and service outage. According to IBM’s 2023 Cost of a Data Breach report, data exfiltration attacks alone cost businesses an average of $3.86 million per incident, a staggering figure that underscores the severity of this issue. Ransomware attacks, on the other hand, can inflict even more damage, with Continue reading

With Gaudi 3, Intel Can Sell AI Accelerators To The PyTorch Masses

We have said it before, and we will say it again right here: If you can make a matrix math engine that runs the PyTorch framework and the Llama large language model, both of which are open source and both of which come out of Meta Platforms and both of which will be widely adopted by enterprises, then you can sell that matrix math engine.

With Gaudi 3, Intel Can Sell AI Accelerators To The PyTorch Masses was written by Timothy Prickett Morgan at The Next Platform.

Total eclipse of the Internet: traffic impacts in Mexico, the US, and Canada

A photo of the eclipse taken by Bryton Herdes, a member of Cloudflare's Network team, in Southern Illinois.

There are events that unite people, like a total solar eclipse, reminding us, humans living on planet Earth, of our shared dependence on the sun. Excitement was obvious in Mexico, several US states, and Canada during the total solar eclipse that occurred on April 8, 2024. Dubbed the Great North American Eclipse, millions gathered outdoors to witness the Moon pass between Earth and the Sun, casting darkness over fortunate states. Amidst the typical gesture of putting the eclipse glasses on and taking them off, depending on if people were looking at the sky during the total eclipse, or before or after, what happened to Internet traffic?

Cloudflare’s data shows a clear impact on Internet traffic from Mexico to Canada, following the path of totality. The eclipse occurred between 15:42 UTC and 20:52 UTC, moving from south to north, as seen in this NASA image of the path and percentage of darkness of the eclipse.

Looking at the United States in aggregate terms, bytes delivered traffic dropped by 8%, and request traffic by 12% as compared to the previous week at 19:00 UTC Continue reading

Mixed Results For The Datacenter Thundering Thirteen In Q4

We have been tracking the financial results for the big players in the datacenter that are public companies for three and a half decades, but starting last year we started dicing and slicing the numbers for the largest IT suppliers for stuff that goes into datacenters so we can give you a better sense what is and what is not happening out there.

Mixed Results For The Datacenter Thundering Thirteen In Q4 was written by Timothy Prickett Morgan at The Next Platform.

NB473: Duty To Report (Your Breaches); Intel Foundry Biz Loses $7 Billion

Take a Network Break! This week we start with some FU on Juniper’s Mist AI, the ConnectWise vulnerability, and the 25th anniversary of the Cisco Cat6. The US Cybersecurity & Infrastructure Security Agency (CISA) has proposed new rules that require organizations to report security incidents within 72 hours and ransomware payments within 24 hours. Intel... Read more »

Major data center power failure (again): Cloudflare Code Orange tested

This post is also available in Français, Español.

Here's a post we never thought we'd need to write: less than five months after one of our major data centers lost power, it happened again to the exact same data center. That sucks and, if you're thinking "why do they keep using this facility??," I don't blame you. We're thinking the same thing. But, here's the thing, while a lot may not have changed at the data center, a lot changed over those five months at Cloudflare. So, while five months ago a major data center going offline was really painful, this time it was much less so.

This is a little bit about how a high availability data center lost power for the second time in five months. But, more so, it's the story of how our team worked to ensure that even if one of our critical data centers lost power it wouldn't impact our customers.

On November 2, 2023, one of our critical facilities in the Portland, Oregon region lost power for an extended period of time. It happened because of a cascading series of faults that appears to have been caused by maintenance by the Continue reading

Developer Week 2024 wrap-up

This post is also available in 简体中文, 繁體中文, 日本語, 한국어, Deutsch, Français and Español.

Developer Week 2024 has officially come to a close. Each day last week, we shipped new products and functionality geared towards giving developers the components they need to build full-stack applications on Cloudflare.

Even though Developer Week is now over, we are continuing to innovate with the over two million developers who build on our platform. Building a platform is only as exciting as seeing what developers build on it. Before we dive into a recap of the announcements, to send off the week, we wanted to share how a couple of companies are using Cloudflare to power their applications:

We have been using Workers for image delivery using R2 and have been able to maintain stable operations for a year after implementation. The speed of deployment and the flexibility of detailed configurations have greatly reduced the time and effort required for traditional server management. In particular, we have seen a noticeable cost savings and are deeply appreciative of the support we have received from Cloudflare Workers.
- FAN Communications


Milkshake helps creators, influencers, and business owners create engaging web pages Continue reading

Running CVP on Proxmox

With VMware jacking up the prices and killing off the free version of ESXi, people are looking to alternatives for a virtualization platform. One of the more popular alternatives is Proxmox, which so far I’m really liking.

If you’re looking to run CVP on Proxmox, here is how I get it installed. I’m not sure if Proxmox counts as officially supported for production CVP (it is KVM, however), but it does work fine in lab. Contact Arista TAC if you’re wondering about Proxmox suitability.

Getting the Image to Proxmox

Oddly enough, what you’ll want to do is get a copy of the CVP OVA, not the KVM image. I’m using the most recent release (at the time of writing, always check Arista.com) of cvp-2024.1.0.ova.

Get it onto your Proxmox box (or one of them if you’re doing a cluster). Place it somewhere where there’s enough space to unpack it. In my case, I have a volume called volume2, which is located at /mnt/pve/volume2.

I made a directory called tmp and copied the file to that directory via SCP (using FileZilla, though there’s several ways to get files onto Proxmox, it’s just a Linux box). I Continue reading

Troubleshooting vPC in My Virtual Lab

I’m preparing a blog post on setting up vPC in a VXLAN/EVPN environment. While doing so, I ran into some issues. Rather than simply fixing them, I wanted to share the troubleshooting experience as it can be useful to see all the things I did to troubleshoot, including commands, packet captures, etc., and learn a little about virtual networking. As always, thanks to Peter Palúch for providing assistance with the process.

Topology

The following topology implemented in ESX is used:

Background

I had just configured the vPC peer link and vPC peer link keepalive. I verified that the vPC was functional with the following command:

Leaf1# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1   
Peer status                       : peer adjacency formed ok      
vPC keep-alive status             : peer is alive                 
Configuration consistency status  : success 
Per-vlan consistency status       : success                       
Type-2 consistency status         : success 
vPC role                          : primary                       
Number of vPCs configured         : 1   
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled
Delay-restore status              : Timer is off.(timeout = 30s)
Delay-restore SVI status          : Timer is off.(timeout =  Continue reading
1 3 4 5 6 7 3,610