Amazon EKS networking options

When setting up a Kubernetes environment with Amazon Elastic Kubernetes Service (EKS), it is crucial to understand your available networking options. EKS offers a range of networking choices that allow you to build a highly available and scalable cloud environment for your workloads.

In this blog post, we will explore the networking and policy enforcement options provided by AWS for Amazon EKS. By the end, you will have a clear understanding of the different networking options and network policy enforcement engines, and other features that can help you create a functional and secure platform for your Kubernetes workloads and services.

Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that simplifies routine operations, such as cluster deployment and maintenance, by automating tasks such as patching and updating operating systems and their underlying components. EKS enhances scalability through AWS Auto Scaling groups and other AWS service integrations and offers a highly available control plane to manage your cluster.

Amazon EKS in the cloud has two options:

  • Managed
  • Self-managed

Managed clusters rely on the AWS control plane node, which AWS hosts and controls separately from your cluster. This node operates in isolation and cannot be directly Continue reading

Must Read: OSPF Protocol Analysis (RFC 1245)

Daniel Dib found the ancient OSPF Protocol Analysis (RFC 1245) that includes the Router CPU section. Please keep in mind the RFC was published in 1991 (35 years ago):

Steve Deering presented results for the Dijkstra calculation in the “MOSPF meeting report” in [3]. Steve’s calculation was done on a DEC 5000 (10 mips processor), using the Stanford internet as a model. His graphs are based on numbers of networks, not number of routers. However, if we extrapolate that the ratio of routers to networks remains the same, the time to run Dijkstra for 200 routers in Steve’s implementation was around 15 milliseconds.

Rule 11 Academy is (Somewhat) Live

The Academy does not replace this blog, the Hedge, etc. Instead, it’s a place for me to recreate all the training materials I’ve taught in the past, put them in one place, and adding new training material besides. It’s light right now, but I plan to post about once or twice a week.

Note this is a subscription site with paid content and two memberships–six months and yearly.

Get six months free using the coupon code BEAG2DRUP0TORNSKUT.

https://rule11.ac/

Thwart Ops Sprawl With a Unified Data Plane 

When updating a critical infrastructure element for application teams takes weeks due to coordination between NetOps, SecOps, PlatformOps and FinOps, you have a problem: ops sprawl. First was the technology ops team. Then came network operations and security operations. Then, arising from the site reliability engineering (SRE) movement and the goal of pushing more ops decisions into the development environment, came

New Consent and Bot Management features for Cloudflare Zaraz

Managing consent online can be challenging. After you’ve figured out the necessary regulations, you usually need to configure some Consent Management Platform (CMP) to load all third-party tools and scripts on your website in a way that respects these demands. Cloudflare Zaraz manages the loading of all of these third-party tools, so it was only natural that in April 2023 we announced the Cloudflare Zaraz CMP: the simplest way to manage consent in a way that seamlessly integrates with your third-party tools manager.

As more and more third-party tool vendors are required to handle consent properly, our CMP has evolved to integrate with these new technologies and standardization efforts. Today, we’re happy to announce that the Cloudflare Zaraz CMP is now compatible with the Interactive Advertising Bureau Transparency and Consent Framework (IAB TCF) requirements, and fully supports Google’s Consent Mode v2 signals. Separately, we’ve taken efforts to improve the way Cloudflare Zaraz handles traffic coming from online bots.

IAB TCF Compatibility

Earlier this year, Google announced that websites that would like to use AdSense and other advertising solutions in the European Economic Area (EEA), the UK, and Switzerland, will be required to use a CMP that is approved by Continue reading

MLAG Deep Dive: LAG Member Failures in VXLAN Fabrics

In the Dealing with LAG Member Failures blog post, we figured out how easy it is to deal with a LAG member failure in a traditional MLAG cluster. The failover could happen in hardware, and even if it’s software-driven, it does not depend on the control plane.

Let’s add a bit of complexity and replace a traditional layer-2 fabric with a VXLAN fabric. The MLAG cluster members still use an MLAG peer link and an anycast VTEP IP address (more details).

Cisco vPC in VXLAN/EVPN Network – Part 4 – Fabric Peering

Like I mentioned in a previous post, normally leafs don’t connect to leafs, but for vPC this is required. What if we don’t want to use physical interfaces for this interconnection? This is where fabric peering comes into play. Now, unfortunately my lab, which is virtual, does not support fabric peering so I will just introduce you to the concept. Let’s compare the traditional vPC to fabric peering, starting with traditional vPC:

The traditional vPC has the following pros and cons:

  • Pros:
    • No dependency on other devices for peer link and peer keepalive link.
    • No contention for bandwidth on interfaces as they are dedicated.
    • This also means no QoS configuration is required.
    • Intent of configuration is clear with dedicated interfaces.
  • Cons:
    • Requires dedicated interfaces that could be used for something else.
    • Interfaces have a cost, both from perspective of buying the switch, but also SFPs.

Now let’s compare that to fabric peering:

Fabric peering has the following pros and cons:

  • Pros:
    • No dedicated interfaces required.
    • Thus reducing cost.
    • Resiliency as there are multiple paths between the two switches.
  • Cons:
    • Dependency to other devices.
    • Dependency to underlay.
    • Contention for bandwidth with other traffic.
    • May require QoS.
    • May be more difficult to Continue reading

PP014: Good Threat Hunting

Have you ever noticed “threat hunting” in vendor products and wondered exactly what it means? James Williams is here to explain: Threat hunting is the R&D of detection engineering. A threat hunter imagines what an attacker might try and, critically, how that behavior would show up in the logs of a particular environment. Then the... Read more »

To Exascale And (Maybe) Beyond!

The difference between “high performance computing” in the general way that many thousands of organizations run traditional simulation and modeling applications and the kind of exascale computing that is only now becoming a little more commonplace is like the difference between a single, two door coupe that goes 65 miles per hour (most of the time) and a fleet of bullet trains that can each hold over 1,300 people and move at more than 300 miles per hour, connecting a country or a continent.

To Exascale And (Maybe) Beyond! was written by Timothy Prickett Morgan at The Next Platform.

Cloudflare’s public IPFS gateways and supporting Interplanetary Shipyard

IPFS, the distributed file system and content addressing protocol, has been around since 2015, and Cloudflare has been a user and operator since 2018, when we began operating a public IPFS gateway. Today, we are announcing our plan to transition this gateway traffic to the IPFS Foundation’s gateway, maintained by the Interplanetary Shipyard (“Shipyard”) team, and discussing what it means for users and the future of IPFS gateways.

As announced in April 2024, many of the IPFS core developers and maintainers now work within a newly created, independent entity called Interplanetary Shipyard after transitioning from Protocol Labs, where IPFS was invented and incubated. By operating as a collective, ongoing maintenance and support of important protocols like IPFS are now even more community-owned and managed. We fully support this “exit to community” and are excited to support Shipyard as they build more great infrastructure for the open web.

On May 14th, 2024, we will begin to transition traffic that comes to Cloudflare’s public IPFS gateway to the IPFS Foundation’s gateway at ipfs.io or dweb.link. Cloudflare’s public IPFS gateway is just one of many – part of a distributed ecosystem that also includes Pinata, eth.limo, and Continue reading

Reclaiming CPU for free with Go’s Profile Guided Optimization

Golang 1.20 introduced support for Profile Guided Optimization (PGO) to the go compiler. This allows guiding the compiler to introduce optimizations based on the real world behaviour of your system. In the Observability Team at Cloudflare, we maintain a few Go-based services that use thousands of cores worldwide, so even the 2-7% savings advertised would drastically reduce our CPU footprint, effectively for free. This would reduce the CPU usage for our internal services, freeing up those resources to serve customer requests, providing measurable improvements to our customer experience. In this post, I will cover the process we created for experimenting with PGO – collecting representative profiles across our production infrastructure and then deploying new PGO binaries and measuring the CPU savings.

How does PGO work?

PGO itself is not a Go-specific tool, although it is relatively new. PGO allows you to take CPU profiles from a program running in production and use that to optimise the generated assembly for that program. This includes a bunch of different optimisations such as inlining heavily used functions more aggressively, reworking branch prediction to favour the more common branches, and rearranging the generated code to lump hot paths together to save on CPU Continue reading

EVPN Instance Deployment Scenario 1: L2-Only EVPN Instance

In this scenario, we are building a protected Broadcast Domain (BD), which we extend to the VXLAN Tunnel Endpoint (VTEP) switches of the EVPN Fabric, Leaf-101 and Leaf-102. Note that the VTEP operates in the Network Virtualization Edge (NVE) role for the VXLAN segment. The term NVE refers to devices that encapsulate data packets to transport them over routed IP infrastructure. Another example of an NVE device is the MPLS Provider Edge (MPLS-PE) router at the edge of the MPLS network, doing MPLS labeling. The term “Tenant System” (TS) refers to a physical host, virtual machine, or an intra-tenant forwarding component attached to one or more Tenant-specific Virtual Networks. Examples of TS forwarding components include firewalls, load balancers, switches, and routers. 

We begin by configuring L2 VLAN 10 to Leaf-101 and Leaf-102 and associate it with the vn-segment 10010. From the NVE perspective, this constitutes an L2-Only network segment, meaning we do not configure an Anycast Gateway (AGW) for the segment, and it does not have any VRF association.

Next, we deploy a Layer 2 EVPN Instance (EVI) with VXLAN Network Identifier (VNI) 10010. We utilize the 'auto' option to generate the Route Distinguisher (RD) and the Route Target (RT) import and export values for the EVI. The RD value is derived from the NVE Interface IP address and the VLAN Identifier (VLAN 10) associated with the EVI, added to the base value 32767 (e.g., 192.168.100.101:32777). The use of the VLAN ID as part of the automatically generated RD value is the reason why VLAN is configured before the EVPN Instance. Similarly, the RT values are derived from the BGP ASN and the VNI (e.g., 65000:10010).

As the final step for EVPN Instance deployment, we add EVI 10010 under the NVE interface configuration as a member vni with the Multicast Group 239.1.1.1 we are using for Broadcast, Unknown Unicast, and Multicast (BUM) traffic. 

For connecting TS1 and TS2 to the Broadcast domain, we will configure Leaf-101's interface Eth 1/5 and Leaf-102's interface Eth1/3 as access ports for VLAN 10.

A few words regarding the terminology utilized in Figure 3-2. '3-Stage Routed Clos Fabric' denotes both the physical topology of the network and the model for forwarding data packets. The 3-Stage Clos topology has three switches (ingress, spine, and egress) between the attached Tenant Systems. Routed, in turn, means that switches forward packets based on the destination IP address.

With the term VXLAN Segment, I refer to a stretched Broadcast Domain, identified by the VXLAN Network Identifier value defined under the EVPN Instance on Leaf switches.



Figure 3-2: L2-Only Intra VN Connection.

Continue reading

netlab 1.8.2: Bug Fixes, Usability Improvements

netlab release 1.8.2 contains dozens of bug fixes and minor tweaks to device configuration templates. We also added a few safeguards including:

  • Check for Vagrant boxes or Docker containers before starting the lab and display pointers to build recipes.
  • Check installed Ansible collections before trying to configure the lab devices.
  • Display a warning if the lab topology was modified after the lab was created