DNS Evolution

The DNS is a crucial part of the Internet's architecture. However, the DNS is not a rigid and unchanging technology. It has changed considerably over the lifetime of the Internet and here I’d like to look at what’s changed and what’s remained the same.

Explore and Fix BGP Wedgies

RFC 4264 defines BGP wedgies as “a class of BGP configurations for which there is more than one potential outcome, and where forwarding states other than the intended state are equally stable.” Even worse, “the stable state where BGP converges may be selected by BGP in a non-deterministic manner.

Want to know more? You can explore a real-life BGP wedgie and fix it in the latest BGP lab exercise.

Automatically replacing polyfill.io links with Cloudflare’s mirror for a safer Internet

polyfill.io, a popular JavaScript library service, can no longer be trusted and should be removed from websites.

Multiple reports, corroborated with data seen by our own client-side security system, Page Shield, have shown that the polyfill service was being used, and could be used again, to inject malicious JavaScript code into users’ browsers. This is a real threat to the Internet at large given the popularity of this library.

We have, over the last 24 hours, released an automatic JavaScript URL rewriting service that will rewrite any link to polyfill.io found in a website proxied by Cloudflare to a link to our mirror under cdnjs. This will avoid breaking site functionality while mitigating the risk of a supply chain attack.

Any website on the free plan has this feature automatically activated now. Websites on any paid plan can turn on this feature with a single click.

You can find this new feature under Security ⇒ Settings on any zone using Cloudflare.

Contrary to what is stated on the polyfill.io website, Cloudflare has never recommended the polyfill.io service or authorized their use of Cloudflare’s name on their website. We have asked them to remove the Continue reading

Cloudflare incident on June 20, 2024

On Thursday, June 20, 2024, two independent events caused an increase in latency and error rates for Internet properties and Cloudflare services that lasted 114 minutes. During the 30-minute peak of the impact, we saw that 1.4 - 2.1% of HTTP requests to our CDN received a generic error page, and observed a 3x increase for the 99th percentile Time To First Byte (TTFB) latency.

These events occurred because:

  1. Automated network monitoring detected performance degradation, re-routing traffic suboptimally and causing backbone congestion between 17:33 and 17:50 UTC
  2. A new Distributed Denial-of-Service (DDoS) mitigation mechanism deployed between 14:14 and 17:06 UTC triggered a latent bug in our rate limiting system that allowed a specific form of HTTP request to cause a process handling it to enter an infinite loop between 17:47 and 19:27 UTC

Impact from these events were observed in many Cloudflare data centers around the world.

With respect to the backbone congestion event, we were already working on expanding backbone capacity in the affected data centers, and improving our network mitigations to use more information about the available capacity on alternative network paths when taking action. In the remainder of this blog post, we will go into Continue reading

Looking for a Simple Multihop EBGP Use Case

I plan to add several challenge labs using multihop EBGP sessions to the BGP labs project, including:

  1. Running BGP between VMs and central BGP route servers
  2. Using multihop EBGP session to send full Internet routing table to a customer without overloading the PE-router
  3. Running EBGP EVPN session between loopbacks advertised with EBGP IPv4 session (🤢)

However, I would love to start with a simple use case to help engineers unfamiliar with BGP realize when they might have to use multihop EBGP sessions. Unfortunately, I can’t find one, and the scenarios where I used multihop EBGP in the past (EBGP load balancing and using a low-end router in the EBGP path, where I was effectively using the reverse application of #2 as a customer) are mostly irrelevant.

Would you have an easy-to-understand use case that is best solved with a multihop EBGP session? Please share it in the comments. Thanks a million!

Looking for a Simple Multihop EBGP Use Case

I plan to add several challenge labs using multihop EBGP sessions to the BGP labs project, including:

  1. Running BGP between VMs and central BGP route servers
  2. Using multihop EBGP session to send full Internet routing table to a customer without overloading the PE-router
  3. Running EBGP EVPN session between loopbacks advertised with EBGP IPv4 session (🤢)

However, I would love to start with a simple use case to help engineers unfamiliar with BGP realize when they might have to use multihop EBGP sessions. Unfortunately, I can’t find one, and the scenarios where I used multihop EBGP in the past (EBGP load balancing and using a low-end router in the EBGP path, where I was effectively using the reverse application of #2 as a customer) are mostly irrelevant.

Would you have an easy-to-understand use case that is best solved with a multihop EBGP session? Please share it in the comments. Thanks a million!

How to Best Use Panorama Templates and Stacks?

How to Best Use Panorama Templates and Stacks?

I’ve been working with Palo Alto Firewalls and Panorama for a few years now, yet the best ways to use Templates still seem somewhat mysterious. I bet many of you feel the same way. Since every network is unique, there isn’t one “right” way to manage this. In this blog post, I’ll break down what Templates and Template Stacks are in Panorama and share some effective strategies for organizing them. Let’s dive in.

A Quick Note on Panorama

If you’re new to Panorama, it’s a centralized management tool that simplifies managing multiple Palo Alto firewalls from a single place. There are two key concepts in Panorama which are Device Groups and Templates. Device Groups manage the configurations you’d usually find under the Policies and Objects tabs on the firewall, while Templates manage with configurations from the Network and Device tabs.

How to Best Use Panorama Templates and Stacks?

It’s important to note that Device Groups and Templates serve different purposes and manage different parts of the configurations. This blog post will focus exclusively on Templates. If you need a refresher on Device Groups and Templates, I’ve covered that in a previous post. Feel free to check it out here for a quick recap.

Ruminations About Europe’s “Alice Recoque” Exascale Supercomputer

Designing chips and shepherding them through the foundry and package and assembly is a complex and difficult process, and not having these skills at a national level has profound implications for the competitiveness of those nations.

Ruminations About Europe’s “Alice Recoque” Exascale Supercomputer was written by Timothy Prickett Morgan at The Next Platform.

Kubernetes network policies: 4 pain points and how to address them

Kubernetes is used everywhere, from test environments to the most critical production foundations that we use daily, making it undoubtedly a de facto in cloud computing. While this is great news for everyone who works with, administers, and expands Kubernetes, the downside is that it makes Kubernetes a favorable target for malicious actors.

Malicious actors typically exploit flaws in the system to gain access to a portion of the environment. They then chain these flaws together to move laterally within the environment, ultimately seeking root access or access to critical information.

While the best way to fix security flaws in any software is to patch it with appropriate fixes that the project maintainers publish, there are certain security practices that you can adopt to fortify your environment, like using network policies. However, most people find network policies complex and overwhelming, which discourages them from implementing policies in their environment.

In this blog post, we will examine four pain points that people face when they want to implement network policies and provide solutions to help you effectively secure your Kubernetes environment.

What is a network policy and why should I use it?

In Kubernetes, a network policy (KNP) resource is the Continue reading

HS076: Greg’s Finale

This is Greg’s last Heavy Strategy episode before he heads off to retirement. He gives us his final pieces of career and life advice, opinions on private equity, and a Cookie Monster quote. We also briefly introduce John Burke, the new Heavy Strategy co-host. Farewell, Greg. Thank you for all the great debates. Episode Transcript... Read more »

Running BGP Labs in GitHub Codespaces

I love open-source tools (and their GitHub repositories). Someone launches a cool idea, and you can dig through their source code to figure out how it works. It beats reading documentation or fixing AI hallucinations every day of the week ;)

Not too long ago, the containerlab team launched the ability to run containerlab within a free1 container2 running on GitHub, and that seemed like a perfect solution to run the BGP labs (Jeroen van Bemmel pointing me in the right direction was another significant step forward).

Running BGP Labs in GitHub Codespaces

I love open-source tools (and their GitHub repositories). Someone launches a cool idea, and you can dig through their source code to figure out how it works. It beats reading documentation or fixing AI hallucinations every day of the week ;)

Not too long ago, the containerlab team launched the ability to run containerlab within a free1 container2 running on GitHub, and that seemed like a perfect solution to run the BGP labs (Jeroen van Bemmel pointing me in the right direction was another significant step forward).

Why Didn’t We Have Anycast Gateways Before VXLAN?

A while back I started thinking about why it took so long before we started using anycast gateways. I started thinking about what would be the reason(s) for not doing it earlier. I came up with some good reasons and it started making sense to me. I then asked you all what your thoughts were and received a ton of great responses. Here are a few that were mentioned:

  • It was a natural evolution.
  • More powerful devices.
  • We didn’t have overlays.
  • There were no protocols to map what device a MAC sits behind.
  • Reusing the same IP would cause IP conflicts.

These are all certainly true to some degree. I would argue though that the main reason why we didn’t have it earlier is because of the topology and protocols we used in traditional LANs. The typical design was to have three layers, access, distribution, and core. The links in access to distribution layer were L2 only and the distribution layer had all the L3 configuration. A typical topology looked like this:

In a topology like this, there are only two devices that host the L3 configuration needed for hosts. When you have two of something, it’s natural to think Continue reading