On October 30, 2024, cloud hosting provider OVHcloud (AS16276) suffered a brief but significant outage. According to their incident report, the problem started at 13:23 UTC, and was described simply as “An incident is in progress on our backbone infrastructure.” OVHcloud noted that the incident ended 17 minutes later, at 13:40 UTC. As a major global cloud hosting provider, some customers use OVHcloud as an origin for sites delivered by Cloudflare — if a given content asset is not in our cache for a customer’s site, we retrieve the asset from OVHcloud.
We observed traffic starting to drop at 13:21 UTC, just ahead of the reported start time. By 13:28 UTC, it was approximately 95% lower than pre-incident levels. Recovery appeared to start at 13:31 UTC, and by 13:40 UTC, the reported end time of the incident, it had reached approximately 50% of pre-incident levels.
Traffic from OVHcloud (AS16276) to Cloudflare
Cloudflare generally exchanges most of our traffic with OVHcloud over peering links. However, as shown below, peered traffic volume during the incident fell significantly. It appears that some small amount of traffic briefly began to flow over transit links from Cloudflare to OVHcloud due to sudden Continue reading
According to a survey done by W3Techs, as of October 2024, Cloudflare is used as an authoritative DNS provider by 14.5% of all websites. As an authoritative DNS provider, we are responsible for managing and serving all the DNS records for our clients’ domains. This means we have an enormous responsibility to provide the best service possible, starting at the data plane. As such, we are constantly investing in our infrastructure to ensure the reliability and performance of our systems.
DNS is often referred to as the phone book of the Internet, and is a key component of the Internet. If you have ever used a phone book, you know that they can become extremely large depending on the size of the physical area it covers. A zone file in DNS is no different from a phone book. It has a list of records that provide details about a domain, usually including critical information like what IP address(es) each hostname is associated with. For example:
example.com 59 IN A 198.51.100.0
blog.example.com 59 IN A 198.51.100.1
ask.example.com 59 IN A 198.51.100.2
It is not unusual Continue reading
Cloudflare’s network spans more than 330 cities in over 120 countries, where we interconnect with over 13,000 network providers in order to provide a broad range of services to millions of customers. The breadth of both our network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the impact of Internet disruptions. Thanks to Cloudflare Radar functionality released earlier this year, we can explore the impact from a routing perspective, as well as a traffic perspective, at both a network and location level.
As we have noted in the past, this post is intended as a summary overview of observed and confirmed disruptions, and is not an exhaustive or complete list of issues that have occurred during the quarter.
A larger list of detected traffic anomalies is available in the Cloudflare Radar Outage Center.
Having said that, the third quarter of 2024 was particularly active, with quite a few significant Internet disruptions. Unfortunately, governments continued to impose nationwide Internet shutdowns intended to prevent cheating on exams. Damage to both terrestrial and submarine cables impacted Internet connectivity across Africa and in other parts of the world. Damage caused by an active hurricane Continue reading
Class-Based Forwarding (CBF) is an effective component that introduces an additional layer of traffic engineering, enabling the differentiation of traffic based on business needs. It allows low-priority traffic to traverse slower paths while ensuring that business-critical traffic utilizes the fastest or best available paths
https://github.com/kashif-nawaz/CBF_over_RSVP_LSPs_Design_Considerations
In the previous blog posts, we explored three fundamental EVPN designs: we don’t need EVPN, IBGP EVPN AF over IGP-advertised loopbacks (the way EVPN was designed to be used) and EBGP-only EVPN (running the EVPN AF in parallel with the IPv4 AF).
Now we’re entering Wonderland: the somewhat unusual1 things vendors do to make their existing stuff work while also pretending to look cool2. We’ll start with EBGP-over-EBGP, and to understand why someone would want to do something like that, we have to go back to the basics.
This post describes how to install a Cisco ISE evaluation VM for labbing. The VM will run for 90 days, providing a full feature set for up to 100 endpoints.
Start by downloading the software. I’ll be using an OVA as I’m going to run my VM on ESX:
Deploy the OVA and select Eval version:
Power on the VM. When the VM boots, the following prompt is shown:
Type setup and press enter:
You will have to configure the hostname, IP address, name server, and so on:
Press 'Ctrl-C' to abort setup Enter hostname[]: ise01 Enter IP address []: 192.168.128.102 Enter IP netmask []: 255.255.255.0 Enter IP default gateway []: 192.168.128.1 Do you want to configure IPv6 address? Y/N [N]: N Enter default DNS domain []: iselab.local Enter primary nameserver []: 192.168.128.100 Add secondary nameserver? Y/N [N]: N Enter NTP server[time.nist.gov]: ntp.netnod.se Add another NTP server? Y/N [N]: N Enter system timezone[UTC]: CET Enable SSH service? Y/N [N]: Y Enter username [admin]: admin Enter password: Enter password again: Bringing up the network interface...
The installation will take some time…
When the installation Continue reading
Similarly to how it handles VRFs, netlab automatically creates VLANs on a lab device if the device uses them on any access- or trunk link or if the VLAN is mentioned in the node vlans dictionary.
If the VLAN is an IRB VLAN (which can be modified globally or per node with the VLAN mode parameter), netlab also creates the VLAN (or SVI, or BVI) interface. But how do you specify the parameters of the VLAN interface?
I’ve written before about choosing a Juniper version. Juniper has a new release process. Well, two actually - the new official process, and what they’re actually doing…
First the good bits. Juniper started a new release process in 2023. Key points:
I like the new process. It simplifies the versions they have to maintain. We used to say that you should wait for the R3 release, but really there’s no difference between R3 and R2-S3. Now Juniper doesn’t have to maintain the quarterly releases, and all the maintenance and service releases below them. It avoids the confusion that happened when they kept patching -R2, even after releasing R3.
But here’s the thing with a simplified release process: you’ve got no excuses for not delivering. I have no issue with 6-monthly feature releases. But it feels like they’re doing annual releases these days.
Look at the current download page for Continue reading
Dear friend,
It’s been a while since I’ve blogged for the last time. Probably it was too long since I’ve blogged. But, here I am back, with some new ideas and fresh perspectives. One of the key new idea is usage of Go, which I’m actively picking up now. And just shortly I will tell you why.
We absolutely do. In fact, we not only using it, but also teaching it from the perspective of network automation. In our flagship training Zero-to-Hero Network Automation Training we guide you the whole way from having little to no theoretical knowledge and practical skills to a good level of developing automation software with Python. Python is at heart of many purpose-built network (and not only) automation systems, such as NetBox, StackStorm and many others. It’s ecosystem is vast and there are no signs of it slowing down. Therefore, getting good exposure to Python from Network Automation perspective is a good step to increase your own value and secure your job place looking forward. To be brutally honest, any network engineering role nowadays requires Python and/or Ansible knowledge, so don’t pass by.
Here is what we have to offer Continue reading
Containerlab v0.58.0 supports running Cisco IOL images, which is something I was very much looking forward to. IOL nodes are an implementation of Cisco IOS-XE that does not run as a full virtual machine. Therefore, the IOL nodes generally consume much less CPU and memory.
Containerlab already has great documentation on how to use Cisco IOL devices, but I'll cover it here as well for any of my readers who are interested. You can check out the official documentation for more info.
If you have Cisco CML (you may need version 2.7 or later), it should include the IOL images. You'll need to use vrnetlab to convert the binary file into a Docker container, which can then be used within Containerlab like any other container/image.
First, I have downloaded these two Cisco IOL files to the Downloads folder. One for L3 and another one for L2.
x86_64_crb_linux-adventerprisek9-ms
x86_64_crb_linux_l2-adventerprisek9-ms.bin
Next, clone the hellt/vrnetlab
repository to your local machine.
git clone https://github.com/hellt/vrnetlab.git
Then, copy these two images into the vrnetlab/cisco/iol
directory Continue reading
Do you procrastinate too much? I know I do. Why do we procrastinate, and what strategies can we use to stop it? Terry Kim joins Eyvonne Sharp and Russ White to consider procrastination.
download
One of the things that I’ve noticed about the rise of AI is that everything feels so wordy now. I’m sure it’s a byproduct of the popularity of ChatGPT and other LLMs that are designed for language. You’ve likely seen it too on websites that have paragraphs of text that feel unnecessary. Maybe you’re looking for an answer to a specific question. You could be trying to find a recipe or even a code block for a problem. What you find is a wall of text that feels pieced together by someone that doesn’t know how to write.
I feel like the biggest issue with those overly word-filled answers comes down to the way that people feel about unnecessary exposition. AI is built to write things on a topic and fill out word count. Much like a student trying to pad out the page length for a required report, AI doesn’t know when to shut up. It specifically adds words that aren’t really required. I realize that there are modes of AI content creation that value being concise but those are the default.
I use AI quite a bit to summarize long articles, many of which Continue reading
With September’s announcement of Hyperdrive’s ability to send database traffic from Workers over Cloudflare Tunnels, we wanted to dive into the details of what it took to make this happen.
Accessing your data from anywhere in Region Earth can be hard. Traditional databases are powerful, familiar, and feature-rich, but your users can be thousands of miles away from your database. This can cause slower connection startup times, slower queries, and connection exhaustion as everything takes longer to accomplish.
Cloudflare Workers is an incredibly lightweight runtime, which enables our customers to deploy their applications globally by default and renders the cold start problem almost irrelevant. The trade-off for these light, ephemeral execution contexts is the lack of persistence for things like database connections. Database connections are also notoriously expensive to spin up, with many round trips required between client and server before any query or result bytes can be exchanged.
Hyperdrive is designed to make the centralized databases you already have feel like they’re global while keeping connections to those databases hot. We use our global network to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users Continue reading
I had a Disaster Recovery Myths and Reality talk at the DEEP conference yesterday. The presentation is already online, but unfortunately, not everyone made it to Zadar (your loss, but I get it).
To counteract that, I made the first part of the Designing Active-Active and Disaster Recovery Data Centers webinar public. Hope you’ll like it.