ddib

Author Archives: ddib

1000BASE-T Part 1 – Introduction

How does Ethernet detect that a link goes down? This, what I thought was a simple question, I asked myself a couple of weeks ago. I realized I didn’t have a very good answer. I realized I had more to learn about Ethernet and the physical layer and so does pretty much the entire networking industry. Through the graceful help of Peter Jones at Cisco, I got in touch with George Zimmerman, an independent professional with a PhD in electrical engineering, a history of teaching at Caltech, and that works within the IEEE on different standards. To answer my initial question, we first need to understand more about Ethernet, and especially the physical layer. As every version of Ethernet has slightly different PHY, I will be covering 1000BASE-T. This will be covered in a series of posts, this being the first.

Going back to the OSI model, most roles in networking puts the focus on layers two to four:

This is natural as most of our work relates to these layers.

When we think of two hosts communicating, we imagine that the transceivers connect to each other and that there are ones and zeroes traveling across the cable:

Continue reading

How Anycast VTEP Broke My Lab And What I Learned

I’m preparing a massive blog post on vPC in the context of VXLAN/EVPN and while doing so I accidentally broke my lab. What a great learning experience! I thought I would share it with you and how to perform troubleshooting of this scenario.

My topology looks like this:

Before I made any changes, there was full connectivity between these hosts, meaning that both bridging and routing was working. I then changed the loopback1 (NVE source interface) configuration of Leaf-1 and Leaf-2 to add a secondary IP. This was the initial configuration:

! Leaf-1
interface loopback1
  description VTEP
  ip address 203.0.113.1/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
! Leaf-2
interface loopback1
  description VTEP
  ip address 203.0.113.2/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

This then changed to:

! Leaf-1
interface loopback1
  description VTEP
  ip address 203.0.113.1/32
  ip address 203.0.113.12/32 secondary
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
! Leaf-2
interface loopback1
  description VTEP
  ip address 203.0.113.2/32
  ip address 203.0.113.12/32 secondary
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim  Continue reading

Routed Packet Walk in VXLAN/EVPN Network

In a previous post, I walked through how a packet gets bridged in a VXLAN/EVPN network. In this post, I’ll go through how a packet gets routed, that is, packet from one VNI to another VNI. The following topology will be used:

The lab has the following characteristics:

  • OSPF in the underlay.
  • Ingress replication for BUM traffic through the use of EVPN.
  • ARP suppression is enabled.

Server-2 initiates a ping towards Server-4:

Frame 562: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens257, id 4
Ethernet II, Src: 00:50:56:ad:f4:8d, Dst: 00:01:00:01:00:01
Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44
Internet Control Message Protocol
    Type: 8 (Echo (ping) request)
    Code: 0
    Checksum: 0xd745 [correct]
    [Checksum Status: Good]
    Identifier (BE): 17 (0x0011)
    Identifier (LE): 4352 (0x1100)
    Sequence Number (BE): 1 (0x0001)
    Sequence Number (LE): 256 (0x0100)
    [Response frame: 563]
    Timestamp from icmp data: Mar  3, 2024 08:38:35.804470000 Romance Standard Time
    [Timestamp from icmp data (relative): 0.000701509 seconds]
    Data (40 bytes)

The destination MAC is 0001.0001.0001 which is the Anycast GW MAC configured on Leaf-2. As this MAC is used on SVI for VLAN 20 of Leaf-2, the Continue reading

EVPN – Asymmetric vs Symmetric IRB

It is well known that VXLAN supports bridging frames, that is, forwarding frames that belong to the same L2 segment. In the beginning, this is all that was supported. There was no VXLAN routing. In essence, the HW didn’t support taking a VXLAN encapsulated packet, decapsulating it, and then performing a L3 lookup. This meant that another device was needed to do the L3 lookup. Think of it as router on a stick where the VTEP would decapsulate the packet and forward it (based on L2 lookup) to a gateway. This gateway needed to have L3 interfaces for all the L2 VNIs that needed routing. Now, this is still applicable in a design where a FW should inspect traffic between all VNIs, but HW has supported for a long time to do VXLAN routing, that is, taking packet from one VNI and routing it to another VNI. This is referred to as Integrated Routing and Bridging (IRB), as the device is capable of both bridging and routing packets. IRB is described in RFC 9135.

There are two types of IRB, asymmetric and symmetric. Asymmetric vs symmetric refers to how the lookup is performed to do routing. Let’s first take a Continue reading

EVPN Terminology

Reading RFCs is a great source of information for understanding all the details of a protocol. Often they do require the reader to be quite technical and the terminology can be confusing if you aren’t used to the type of language and writing style used in RFCs. In this post, I go through some of the most important terminology in EVPN and VXLAN to help you build your understanding of the different forwarding constructs and how they interact.

The picture below shows some of the most important terminology in EVPN:

Let’s go through the terms used in the diagram and some additional ones:

  • Attachment circuit – An interface that is associated with a bridge table. The AC that the packet arrived on is determined by examining the port, and optionally VLAN tag.
  • Broadcast Domain – The Broadcast domain consists of all devices and hosts that would receive a broadcast frame when sent in that domain (assuming no ARP optimization features used). This is normally a VLAN, and it normally maps to one subnet. From a VXLAN perspective, it would be a L2 VNI. An EVI may contain one or more BDs depending on service model.
  • Bridge Table – Bridge Table Continue reading

Bridging Packet Walk In VXLAN/EVPN Network

In this post I walk you through all the steps and packets involved in two hosts communicating over a L2 VNI in a VXLAN/EVPN network. The topology below is the one we will be using:

The lab has the following characteristics:

  • OSPF in the underlay.
  • Ingress replication for BUM traffic through the use of EVPN.
  • ARP suppression is enabled.
  • ARP cache is cleared on Server-1 and Server-4 before initating the packet capture.
  • Server-1 is the host sourcing traffic by pinging Server-4.

Server-1 clears the ARP entry for Server-4 and initiates the ping:

sudo ip neighbor del 198.51.100.44 dev ens160
ping 198.51.100.44
PING 198.51.100.44 (198.51.100.44) 56(84) bytes of data.
64 bytes from 198.51.100.44: icmp_seq=1 ttl=64 time=6.38 ms
64 bytes from 198.51.100.44: icmp_seq=2 ttl=64 time=4.56 ms
64 bytes from 198.51.100.44: icmp_seq=3 ttl=64 time=4.60 ms

Below is packet capture showing the ARP request from Server-1:

Frame 7854: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface ens257, id 4
Ethernet II, Src: 00:50:56:ad:85:06, Dst: ff:ff:ff:ff:ff:ff
Address Resolution Protocol (request)
    Hardware type: Ethernet (1)
    Protocol type:  Continue reading

Why Is BFD More Light Weight Than Routing Hellos?

There are many articles on BFD. It is well known that BFD has the following advantages over routing protocol hellos/keepalives:

  • BFD is more light weight than hellos/keepalives.
  • Multiple clients can register to BFD instead of configuring each protocol with aggressive timers.
  • On some platforms, BFD can be offloaded to the hardware instead of the CPU.
  • BFD provides faster timers than routing protocols.
  • BFD is less CPU intensive.

What does light weight mean, though? Does it mean that the packets are smaller? Let’s compare a BFD packet to an OSPF Hello. Starting with the OSPF Hello:

Frame 269: 114 bytes on wire (912 bits), 114 bytes captured (912 bits) on interface ens192, id 1
Ethernet II, Src: 00:50:56:ad:8d:3c, Dst: 01:00:5e:00:00:05
Internet Protocol Version 4, Src: 203.0.113.0, Dst: 224.0.0.5
Open Shortest Path First
    OSPF Header
        Version: 2
        Message Type: Hello Packet (1)
        Packet Length: 48
        Source OSPF Router: 192.168.128.223
        Area ID: 0.0.0.0 (Backbone)
        Checksum: 0x7193 [correct]
        Auth Type: Null (0)
        Auth Data (none): 0000000000000000
    OSPF Hello Packet
    OSPF LLS Data Block

There’s 114 bytes on the wire consisting of:

Catalyst SD-WAN Enhanced Application Aware Routing

Traditionally, Cisco has leveraged BFD to monitor tunnels and their performance and Application Aware Routing (AAR) to reroute traffic. BFD has been used to measure:

  • Latency.
  • Loss.
  • Jitter.

Additionally, BFD is also used to verify liveliness of the tunnels. This works well, but there are some drawbacks to using a separate protocol for measuring performance:

  • You are adding control plane packets competing for bandwidth with packets in data plane.
  • Sending control plane packets frequently may overload the control plane.
    • This may lead to false positives.
  • It’s not guaranteed that control plane packets and data plane packets are treated equally.
  • AAR did take some time to react to poor transports as it had to collect enough measurements before reacting.
  • AAR didn’t have a built-in dampening mechanism.

With the default BFD settings, BFD packets are sent every second. The default AAR configuration consists of six buckets that hold 10 minutes of data each. This means that with the default settings, AAR will react in 10-60 minutes depending on how poorly the transport is performing. The most aggressive AAR configuration recommended by Cisco was to have 5 buckets holding 2 minutes of data each. AAR would then react in 2-10 minutes which I Continue reading

Catalyst SD-WAN 20.13 – RBAC

Catalyst SD-WAN has supported Role Based Access Control (RBAC) for a long time. It has been possible to use predefined roles or create custom roles and defining what areas the user should have access to. However, before 20.13 it was not possible to define a scope. In large companies it’s quite common that one group manages one set of devices, for example all the sites in EU, all the sites in the US, etc. There may also be multiple business units within the company which may share some infrastructure but operate autonomously from each other where a BU should only have access to its own set of devices. As of 20.13, it is not possible to define scope when using RBAC in Catalyst SD-WAN.

There is another feature, called Network Hierarchy that is somewhat related to RBAC. When onboarding devices, you assign a Site ID to the device. The site is then assigned a name in the format of SITE_SiteID, for example SITE_10 when using a Site ID of 10. By default all sites belong to the global node as can be seen below:

Note that it says Auto-Generated site. It is possible to edit the site Continue reading

NX-OS Forwarding Constructs For VXLAN/EVPN

In this post we will look at the forwarding constructs in NX-OS in the context of VXLAN and EVPN. Having knowledge of the forwarding constructs helps both with understanding of the protocols, but also to assist in troubleshooting. BRKDCN-3040 from Cisco Live has a nice overview of the components involved:

There are components that are platform independent (PI) and platform dependent (PD). Below I’ll explain what each component does:

  • ARP – Information from ARP requests/responses is needed to build adjacencies. The information learned from ARP is used to populate IP address field in RT2 and hence also to populate the ARP suppression cache.
  • IPv6 ND – ND fills the role of ARP, but for IPv6.
  • Adjacency Manager – Resolves directly attached hosts MAC addresses.
  • Host Mobility Manager – Tracks the endpoints and their movements.
  • L2FM – The Layer2 Forwarding Manager. A platform dependent component that programs ASICs for L2 forwarding. Keeps track of MAC addresses, their placement and moves, and synchronizes this information across ASICS, line cards, and vPC peers when vPC is in use.
  • MFDM – Multicast Forwarding Database Manager. A platform dependent component that programs ASICs with information to perform multicast forwarding.
  • L2RIB – The component that handles Continue reading

EVPN Route Type 5

In a previous post, EVPN Deepdive Route Types 2 and 3, I covered route types 2 and 3. In this post I’ll cover route type 5 which is used for advertising IP prefixes. This route type is covered in RFC 9136.

There are two main use cases for advertising IP prefixes in EVPN route type 5:

  • Advertising external prefixes into the VXLAN network.
  • Advertising prefixes for connectivity towards silent hosts.

The first scenario is pretty obvious. There are other places in the network, such as remote offices via a WAN, partners and external parties, as well as the internet. To route towards these destinations, a route type is needed and this is route type 5. Remember, route type 2 only provides host routing which poses the following problems for external connectivity:

  • Advertising everything as /32 and /128 would be highly inefficient.
  • It requires an EVPN speaker to generate the RT2 and the external prefixes are originated from non-EVPN speakers.
  • It would not be possible to advertise a default route.
  • Without RT5, external connectivity would have to be advertised from another protocol than EVPN.

The last bullet may be worth expanding a bit on. If the external prefixes aren’t advertised Continue reading

Simulate a Silent Host in a VXLAN Network

I’m working on a blog post explaining route type 5 in EVPN. To demonstrate a scenario with a silent host, I want to simulate this behavior. Normally, hosts can be quite chatty and ARP for their GW, for example. In this post I will show how arptables on Linux can be used to simulate a silent host.

Currently the leaf switch has an ARP entry for the host:

Leaf4# show ip arp vrf Tenant1

Flags: * - Adjacencies learnt on non-active FHRP router
       + - Adjacencies synced via CFSoE
       # - Adjacencies Throttled for Glean
       CP - Added via L2RIB, Control plane Adjacencies
       PS - Added via L2RIB, Peer Sync
       RO - Re-Originated Peer Sync Entry
       D - Static Adjacencies attached to down interface

IP ARP Table for context Tenant1
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
198.51.100.44   00:15:20  0050.56ad.7d68  Vlan10           

It is possible to ping the host from the leaf switch:

Leaf4# ping 198.51.100.44 vrf Tenant1
PING 198.51.100.44 (198.51.100.44): 56 data bytes
64 bytes from 198.51.100.44: icmp_seq=0 ttl=63 time=1.355 ms
64 bytes from 198.51.100.44:  Continue reading

VXLAN/EVPN – Host routing

In an previous post Advertising IPs In EVPN Route Type 2, I described use cases for advertising IP addresses in EVPN route type 2. Host ARP and host mobility I already covered so today we will focus on host routing.

To be able to show this scenario, I have added another server (SERVER-2) and will be using the topology below:

There is already existing configuration for VLAN 10 (L2 VNI) and for VLAN 100 (L3 VNI) which is shown below:

vrf context Tenant1
  vni 10001
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
!
interface Vlan10
  no shutdown
  vrf member Tenant1
  ip address 198.51.100.1/24
  fabric forwarding mode anycast-gateway
!
interface Vlan100
  no shutdown
  mtu 9216
  vrf member Tenant1
  ip forward

To get SERVER-2 connected the following is needed:

  • Configure VLAN 20 and map it to L2 VNI (VNI 10002).
  • Make the L2 VNI a member of the NVE.
  • SVI for VLAN 20.
  • Configure port towards SERVER-2 in VLAN 20.

This is shown below:

vlan 20
  vn-segment 10002
!
interface nve1
  member vni 10002
    ingress-replication protocol bgp
!
interface Vlan20
  no shutdown
  vrf member Tenant1
  ip address 10.0.0.1/24
  fabric forwarding mode anycast-gateway
!
interface Ethernet1/3
   Continue reading

VXLAN/EVPN – Host mobility

In the previous post VXLAN/EVPN – Host ARP, I talked about how knowing the MAC/IP of endpoints allows for ARP suppression. In this post we’ll take a look at host mobility. The topology used is the same as in the previous post:

Currently SERVER-1 is connected to LEAF-1. What happens if SERVER-1 moves to LEAF-2? This would be a common scenario for a virtual infrastructure. First let’s take a look at LEAF-4 on what routes we have for SERVER-1:

Leaf4# show bgp l2vpn evpn 0050.56ad.8506
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 192.0.2.3:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.8506]:[0]:[0.0.0.0]/216, version 662
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop
  AS-Path: NONE, path sourced internal to AS
    203.0.113.1 (metric 81) from 192.0.2.12 (192.0.2.2)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10000
      Extcommunity: RT:65000:10000 ENCAP:8
      Originator: 192.0.2.3 Cluster list: 192.0.2.2 

  Advertised  Continue reading

VXLAN/EVPN – Host ARP

In the last post Advertising IPs In EVPN Route Type 2, I described how to get IPs advertised in EVPN route type 2, but why do we need it? There are three main scenarios where having the MAC/IP mapping is useful:

  • Host ARP.
  • Host mobility.
  • Host routing.

In this post I will cover the first use case and the topology below will be used:

Host ARP

When two hosts in the same subnet want to send Ethernet frames to each other, they will ARP to discover the MAC address of the other host. This is no different in a VXLAN/EVPN network. The ARP frame, which is broadcast, will have to be flooded to other VTEPs either using multicast in the underlay or by ingress replication. Because the frame is broadcast, it will have to go to all the VTEPs that have that VNI. The scenario with ingress replication is shown below:

In this scenario, SERVER-1 is sending an ARP request to get the MAC address of SERVER-4. As all leafs are participating in the L2 VNI, LEAF-1 will perform ingress replication and send it to all leafs. However, sending the ARP request to LEAF-2 and LEAF-3 is not needed Continue reading

Advertising IPs In EVPN Route Type 2

In my last post EVPN Deepdive Route Types 2 and 3, we took a deepdive into these two route types. I mentioned that the IP address of a host, a /32 or /128 address, could optionally be advertised. I also mentioned that this is mainly to facilitate features such as ARP suppression where a VTEP will be aware of the MAC/IP mapping and not have to flood BUM frames. However, in my last lab no IP addresses were advertised. Why is that? How do we get them advertised?

Currently, I have only setup a L2 VNI in the lab. This provides connectivity for the VLAN that my hosts are in, but it does not provide any L3 services. There is no SVI configured and there is also no L3 service configured that can route between different VNIs. The “standard” way of setting this up would be to configure anycast gateway on the leafs where every leaf that hosts the VNI has the same IP/MAC, but I consider this to be an optimization that I want to cover in a future posts. I prefer to break things down into their components and focus on the configuration needed for each component Continue reading

EVPN Deepdive Route Types 2 and 3

In my last post on Configuring EVPN, we setup EVPN but configured no services. In this post we will configure a basic L2 service so we can dive into the different EVPN route types. This post will cover route type 2 and 3 together as you will commonly see these together. This post will cover:

  • Discovery of VTEPs.
  • How to map a VLAN to a VNI.
  • Automatic generation of RD and RT.
  • Advertising MAC- and optionally IP address (route type 2).
  • Ingress replication with dynamic discovery of VTEPs (route type 3).

The topology we will use for this post is shown below:

Before diving into configuration, let’s discuss something that is often overlooked, VTEP discovery.

VTEP discovery

Without EVPN, VXLAN uses flood and learn behavior for discovery of VTEPs. This means that any host sending VXLAN frames would be considered a trusted VTEP in the network. This is obviously not great from a security perspective. When using EVPN, adding VTEPs is based on BGP messages. A VTEP will learn about other VTEPs based on these BGP updates. It’s not a specific route type, but rather any type of EVPN message. This makes it more difficult to add a rogue Continue reading

VRF Without Route Target – Will the Route Be Exported?

Yesterday I posted a tricky question to Twitter. If you have a working VPNv4 environment and create a VRF with only a Route Distinguisher (RD) but without Route Targets (RT), will the route be exported? The answer may surprise you! The configuration supplied in the question was similar to the one below:

vrf definition QUIZ
 rd 198.51.100.1:100 
 !
 address-family ipv4
 exit-address-family
!
interface GigabitEthernet2
 vrf forwarding QUIZ
 ip address 203.0.113.1 255.255.255.0
!
router bgp 65000
 !
 address-family ipv4 vrf QUIZ
  network 203.0.113.0

Notice how this VRF has a RD but no RT. Will this router, PE1, advertise the route into VPNv4? Most would say no, but the answer is yes. Let’s first check that we see the route locally on PE1 in VRF QUIZ:

PE1#show bgp vpnv4 uni vrf QUIZ 203.0.113.0
BGP routing table entry for 198.51.100.1:100:203.0.113.0/24, version 4
Paths: (1 available, best #1, table QUIZ)
  Advertised to update-groups:
     1         
  Refresh Epoch 1
  Local
    0.0.0.0 (via vrf QUIZ) from 0.0.0.0 (198.51.100.1)
      Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, local, best
      mpls  Continue reading

Configuring EVPN on NX-OS

In this post we will configure EVPN on NX-OS. We will reuse the VXLAN topology from my previous post. The following will describe the setup in this post:

  • VXLAN topology with OSPF as the IGP in the underlay using unnumbered links.
  • EVPN in the overlay using iBGP.
  • Spines acting as route reflectors.
  • Separate loopbacks for IGP, BGP, and NVE.
  • Ingress replication based on EVPN.
  • Enhancements such as anycast gateway, ARP suppression, etc., will be covered in future posts.

The BGP topology is shown below:

I will cover all the details of configuring EVPN and establishing the BGP sessions. We will then cover the actual exchange of routes in detail in separate posts in the future.

Starting out, the following globals and features need to be configured:

Next, let’s configure BGP on the spines with the following settings:

Then let’s configure BGP on the leafs:

The devices will now advertise that they have AFI L2VPN and SAFI EVPN:

The BGP sessions are now up:

Leaf1# show bgp l2vpn evpn sum
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 192.0.2.3, local AS number 65000
BGP table version is 4, L2VPN EVPN config peers  Continue reading

Introduction to EVPN In VXLAN Networks

In previous posts I described VXLAN using flood and learn behavior using multicast or ingress replication. The drawback to flood and learn is that frames need to be flooded/replicated for the VTEPs to learn of each other and for learning what MAC addresses are available through each VTEP. This isn’t very efficient. Isn’t there a better way of learning this information? This is where Ethernet VPN (EVPN) comes into play. What is it? As you know, BGP can carry all sorts of information and EVPN is just BGP with support to carry information about VTEPs, MAC addresses, IP addresses, VRFs, and some other stuff. What does EVPN provide us?

  • Ability to discover VTEPs.
  • Messaging of MAC prefixes and IP prefixes.
  • Reduced amount of flooding.
  • Optionally ARP suppression.

Note that the use of EVPN doesn’t entirely remove the need for flooding using multicast or ingress replication. Hosts still need to use ARP/ND to find the MAC address of each other, although ARP suppression could potentially help with that. There may also be protocols such as DHCP that leverage broadcast for some messages. In addition, there may be silent hosts in the fabric where VTEP is not aware that the host is Continue reading