How does Ethernet detect that a link goes down? This, what I thought was a simple question, I asked myself a couple of weeks ago. I realized I didn’t have a very good answer. I realized I had more to learn about Ethernet and the physical layer and so does pretty much the entire networking industry. Through the graceful help of Peter Jones at Cisco, I got in touch with George Zimmerman, an independent professional with a PhD in electrical engineering, a history of teaching at Caltech, and that works within the IEEE on different standards. To answer my initial question, we first need to understand more about Ethernet, and especially the physical layer. As every version of Ethernet has slightly different PHY, I will be covering 1000BASE-T. This will be covered in a series of posts, this being the first.
Going back to the OSI model, most roles in networking puts the focus on layers two to four:
This is natural as most of our work relates to these layers.
When we think of two hosts communicating, we imagine that the transceivers connect to each other and that there are ones and zeroes traveling across the cable:
I’m preparing a massive blog post on vPC in the context of VXLAN/EVPN and while doing so I accidentally broke my lab. What a great learning experience! I thought I would share it with you and how to perform troubleshooting of this scenario.
My topology looks like this:
Before I made any changes, there was full connectivity between these hosts, meaning that both bridging and routing was working. I then changed the loopback1 (NVE source interface) configuration of Leaf-1 and Leaf-2 to add a secondary IP. This was the initial configuration:
! Leaf-1 interface loopback1 description VTEP ip address 203.0.113.1/32 ip router ospf UNDERLAY area 0.0.0.0 ip pim sparse-mode ! Leaf-2 interface loopback1 description VTEP ip address 203.0.113.2/32 ip router ospf UNDERLAY area 0.0.0.0 ip pim sparse-mode
This then changed to:
! Leaf-1 interface loopback1 description VTEP ip address 203.0.113.1/32 ip address 203.0.113.12/32 secondary ip router ospf UNDERLAY area 0.0.0.0 ip pim sparse-mode ! Leaf-2 interface loopback1 description VTEP ip address 203.0.113.2/32 ip address 203.0.113.12/32 secondary ip router ospf UNDERLAY area 0.0.0.0 ip pim Continue reading
In a previous post, I walked through how a packet gets bridged in a VXLAN/EVPN network. In this post, I’ll go through how a packet gets routed, that is, packet from one VNI to another VNI. The following topology will be used:
The lab has the following characteristics:
Server-2 initiates a ping towards Server-4:
Frame 562: 98 bytes on wire (784 bits), 98 bytes captured (784 bits) on interface ens257, id 4 Ethernet II, Src: 00:50:56:ad:f4:8d, Dst: 00:01:00:01:00:01 Internet Protocol Version 4, Src: 10.0.0.22, Dst: 198.51.100.44 Internet Control Message Protocol Type: 8 (Echo (ping) request) Code: 0 Checksum: 0xd745 [correct] [Checksum Status: Good] Identifier (BE): 17 (0x0011) Identifier (LE): 4352 (0x1100) Sequence Number (BE): 1 (0x0001) Sequence Number (LE): 256 (0x0100) [Response frame: 563] Timestamp from icmp data: Mar 3, 2024 08:38:35.804470000 Romance Standard Time [Timestamp from icmp data (relative): 0.000701509 seconds] Data (40 bytes)
The destination MAC is 0001.0001.0001 which is the Anycast GW MAC configured on Leaf-2. As this MAC is used on SVI for VLAN 20 of Leaf-2, the Continue reading
It is well known that VXLAN supports bridging frames, that is, forwarding frames that belong to the same L2 segment. In the beginning, this is all that was supported. There was no VXLAN routing. In essence, the HW didn’t support taking a VXLAN encapsulated packet, decapsulating it, and then performing a L3 lookup. This meant that another device was needed to do the L3 lookup. Think of it as router on a stick where the VTEP would decapsulate the packet and forward it (based on L2 lookup) to a gateway. This gateway needed to have L3 interfaces for all the L2 VNIs that needed routing. Now, this is still applicable in a design where a FW should inspect traffic between all VNIs, but HW has supported for a long time to do VXLAN routing, that is, taking packet from one VNI and routing it to another VNI. This is referred to as Integrated Routing and Bridging (IRB), as the device is capable of both bridging and routing packets. IRB is described in RFC 9135.
There are two types of IRB, asymmetric and symmetric. Asymmetric vs symmetric refers to how the lookup is performed to do routing. Let’s first take a Continue reading
Reading RFCs is a great source of information for understanding all the details of a protocol. Often they do require the reader to be quite technical and the terminology can be confusing if you aren’t used to the type of language and writing style used in RFCs. In this post, I go through some of the most important terminology in EVPN and VXLAN to help you build your understanding of the different forwarding constructs and how they interact.
The picture below shows some of the most important terminology in EVPN:
Let’s go through the terms used in the diagram and some additional ones:
In this post I walk you through all the steps and packets involved in two hosts communicating over a L2 VNI in a VXLAN/EVPN network. The topology below is the one we will be using:
The lab has the following characteristics:
Server-1 clears the ARP entry for Server-4 and initiates the ping:
sudo ip neighbor del 198.51.100.44 dev ens160 ping 198.51.100.44 PING 198.51.100.44 (198.51.100.44) 56(84) bytes of data. 64 bytes from 198.51.100.44: icmp_seq=1 ttl=64 time=6.38 ms 64 bytes from 198.51.100.44: icmp_seq=2 ttl=64 time=4.56 ms 64 bytes from 198.51.100.44: icmp_seq=3 ttl=64 time=4.60 ms
Below is packet capture showing the ARP request from Server-1:
Frame 7854: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface ens257, id 4 Ethernet II, Src: 00:50:56:ad:85:06, Dst: ff:ff:ff:ff:ff:ff Address Resolution Protocol (request) Hardware type: Ethernet (1) Protocol type: Continue reading
There are many articles on BFD. It is well known that BFD has the following advantages over routing protocol hellos/keepalives:
What does light weight mean, though? Does it mean that the packets are smaller? Let’s compare a BFD packet to an OSPF Hello. Starting with the OSPF Hello:
Frame 269: 114 bytes on wire (912 bits), 114 bytes captured (912 bits) on interface ens192, id 1 Ethernet II, Src: 00:50:56:ad:8d:3c, Dst: 01:00:5e:00:00:05 Internet Protocol Version 4, Src: 203.0.113.0, Dst: 224.0.0.5 Open Shortest Path First OSPF Header Version: 2 Message Type: Hello Packet (1) Packet Length: 48 Source OSPF Router: 192.168.128.223 Area ID: 0.0.0.0 (Backbone) Checksum: 0x7193 [correct] Auth Type: Null (0) Auth Data (none): 0000000000000000 OSPF Hello Packet OSPF LLS Data Block
There’s 114 bytes on the wire consisting of:
Traditionally, Cisco has leveraged BFD to monitor tunnels and their performance and Application Aware Routing (AAR) to reroute traffic. BFD has been used to measure:
Additionally, BFD is also used to verify liveliness of the tunnels. This works well, but there are some drawbacks to using a separate protocol for measuring performance:
With the default BFD settings, BFD packets are sent every second. The default AAR configuration consists of six buckets that hold 10 minutes of data each. This means that with the default settings, AAR will react in 10-60 minutes depending on how poorly the transport is performing. The most aggressive AAR configuration recommended by Cisco was to have 5 buckets holding 2 minutes of data each. AAR would then react in 2-10 minutes which I Continue reading
Catalyst SD-WAN has supported Role Based Access Control (RBAC) for a long time. It has been possible to use predefined roles or create custom roles and defining what areas the user should have access to. However, before 20.13 it was not possible to define a scope. In large companies it’s quite common that one group manages one set of devices, for example all the sites in EU, all the sites in the US, etc. There may also be multiple business units within the company which may share some infrastructure but operate autonomously from each other where a BU should only have access to its own set of devices. As of 20.13, it is not possible to define scope when using RBAC in Catalyst SD-WAN.
There is another feature, called Network Hierarchy that is somewhat related to RBAC. When onboarding devices, you assign a Site ID to the device. The site is then assigned a name in the format of SITE_SiteID, for example SITE_10 when using a Site ID of 10. By default all sites belong to the global node as can be seen below:
Note that it says Auto-Generated site. It is possible to edit the site Continue reading
In this post we will look at the forwarding constructs in NX-OS in the context of VXLAN and EVPN. Having knowledge of the forwarding constructs helps both with understanding of the protocols, but also to assist in troubleshooting. BRKDCN-3040 from Cisco Live has a nice overview of the components involved:
There are components that are platform independent (PI) and platform dependent (PD). Below I’ll explain what each component does:
In a previous post, EVPN Deepdive Route Types 2 and 3, I covered route types 2 and 3. In this post I’ll cover route type 5 which is used for advertising IP prefixes. This route type is covered in RFC 9136.
There are two main use cases for advertising IP prefixes in EVPN route type 5:
The first scenario is pretty obvious. There are other places in the network, such as remote offices via a WAN, partners and external parties, as well as the internet. To route towards these destinations, a route type is needed and this is route type 5. Remember, route type 2 only provides host routing which poses the following problems for external connectivity:
The last bullet may be worth expanding a bit on. If the external prefixes aren’t advertised Continue reading
I’m working on a blog post explaining route type 5 in EVPN. To demonstrate a scenario with a silent host, I want to simulate this behavior. Normally, hosts can be quite chatty and ARP for their GW, for example. In this post I will show how arptables on Linux can be used to simulate a silent host.
Currently the leaf switch has an ARP entry for the host:
Leaf4# show ip arp vrf Tenant1 Flags: * - Adjacencies learnt on non-active FHRP router + - Adjacencies synced via CFSoE # - Adjacencies Throttled for Glean CP - Added via L2RIB, Control plane Adjacencies PS - Added via L2RIB, Peer Sync RO - Re-Originated Peer Sync Entry D - Static Adjacencies attached to down interface IP ARP Table for context Tenant1 Total number of entries: 1 Address Age MAC Address Interface Flags 198.51.100.44 00:15:20 0050.56ad.7d68 Vlan10
It is possible to ping the host from the leaf switch:
Leaf4# ping 198.51.100.44 vrf Tenant1 PING 198.51.100.44 (198.51.100.44): 56 data bytes 64 bytes from 198.51.100.44: icmp_seq=0 ttl=63 time=1.355 ms 64 bytes from 198.51.100.44: Continue reading
In an previous post Advertising IPs In EVPN Route Type 2, I described use cases for advertising IP addresses in EVPN route type 2. Host ARP and host mobility I already covered so today we will focus on host routing.
To be able to show this scenario, I have added another server (SERVER-2) and will be using the topology below:
There is already existing configuration for VLAN 10 (L2 VNI) and for VLAN 100 (L3 VNI) which is shown below:
vrf context Tenant1 vni 10001 rd auto address-family ipv4 unicast route-target both auto route-target both auto evpn ! interface Vlan10 no shutdown vrf member Tenant1 ip address 198.51.100.1/24 fabric forwarding mode anycast-gateway ! interface Vlan100 no shutdown mtu 9216 vrf member Tenant1 ip forward
To get SERVER-2 connected the following is needed:
This is shown below:
vlan 20 vn-segment 10002 ! interface nve1 member vni 10002 ingress-replication protocol bgp ! interface Vlan20 no shutdown vrf member Tenant1 ip address 10.0.0.1/24 fabric forwarding mode anycast-gateway ! interface Ethernet1/3 Continue reading
In the previous post VXLAN/EVPN – Host ARP, I talked about how knowing the MAC/IP of endpoints allows for ARP suppression. In this post we’ll take a look at host mobility. The topology used is the same as in the previous post:
Currently SERVER-1 is connected to LEAF-1. What happens if SERVER-1 moves to LEAF-2? This would be a common scenario for a virtual infrastructure. First let’s take a look at LEAF-4 on what routes we have for SERVER-1:
Leaf4# show bgp l2vpn evpn 0050.56ad.8506 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.3:32777 BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.8506]:[0]:[0.0.0.0]/216, version 662 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.1 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 Extcommunity: RT:65000:10000 ENCAP:8 Originator: 192.0.2.3 Cluster list: 192.0.2.2 Advertised Continue reading
In the last post Advertising IPs In EVPN Route Type 2, I described how to get IPs advertised in EVPN route type 2, but why do we need it? There are three main scenarios where having the MAC/IP mapping is useful:
In this post I will cover the first use case and the topology below will be used:
When two hosts in the same subnet want to send Ethernet frames to each other, they will ARP to discover the MAC address of the other host. This is no different in a VXLAN/EVPN network. The ARP frame, which is broadcast, will have to be flooded to other VTEPs either using multicast in the underlay or by ingress replication. Because the frame is broadcast, it will have to go to all the VTEPs that have that VNI. The scenario with ingress replication is shown below:
In this scenario, SERVER-1 is sending an ARP request to get the MAC address of SERVER-4. As all leafs are participating in the L2 VNI, LEAF-1 will perform ingress replication and send it to all leafs. However, sending the ARP request to LEAF-2 and LEAF-3 is not needed Continue reading
In my last post EVPN Deepdive Route Types 2 and 3, we took a deepdive into these two route types. I mentioned that the IP address of a host, a /32 or /128 address, could optionally be advertised. I also mentioned that this is mainly to facilitate features such as ARP suppression where a VTEP will be aware of the MAC/IP mapping and not have to flood BUM frames. However, in my last lab no IP addresses were advertised. Why is that? How do we get them advertised?
Currently, I have only setup a L2 VNI in the lab. This provides connectivity for the VLAN that my hosts are in, but it does not provide any L3 services. There is no SVI configured and there is also no L3 service configured that can route between different VNIs. The “standard” way of setting this up would be to configure anycast gateway on the leafs where every leaf that hosts the VNI has the same IP/MAC, but I consider this to be an optimization that I want to cover in a future posts. I prefer to break things down into their components and focus on the configuration needed for each component Continue reading
In my last post on Configuring EVPN, we setup EVPN but configured no services. In this post we will configure a basic L2 service so we can dive into the different EVPN route types. This post will cover route type 2 and 3 together as you will commonly see these together. This post will cover:
The topology we will use for this post is shown below:
Before diving into configuration, let’s discuss something that is often overlooked, VTEP discovery.
Without EVPN, VXLAN uses flood and learn behavior for discovery of VTEPs. This means that any host sending VXLAN frames would be considered a trusted VTEP in the network. This is obviously not great from a security perspective. When using EVPN, adding VTEPs is based on BGP messages. A VTEP will learn about other VTEPs based on these BGP updates. It’s not a specific route type, but rather any type of EVPN message. This makes it more difficult to add a rogue Continue reading
Yesterday I posted a tricky question to Twitter. If you have a working VPNv4 environment and create a VRF with only a Route Distinguisher (RD) but without Route Targets (RT), will the route be exported? The answer may surprise you! The configuration supplied in the question was similar to the one below:
vrf definition QUIZ rd 198.51.100.1:100 ! address-family ipv4 exit-address-family ! interface GigabitEthernet2 vrf forwarding QUIZ ip address 203.0.113.1 255.255.255.0 ! router bgp 65000 ! address-family ipv4 vrf QUIZ network 203.0.113.0
Notice how this VRF has a RD but no RT. Will this router, PE1, advertise the route into VPNv4? Most would say no, but the answer is yes. Let’s first check that we see the route locally on PE1 in VRF QUIZ:
PE1#show bgp vpnv4 uni vrf QUIZ 203.0.113.0 BGP routing table entry for 198.51.100.1:100:203.0.113.0/24, version 4 Paths: (1 available, best #1, table QUIZ) Advertised to update-groups: 1 Refresh Epoch 1 Local 0.0.0.0 (via vrf QUIZ) from 0.0.0.0 (198.51.100.1) Origin IGP, metric 0, localpref 100, weight 32768, valid, sourced, local, best mpls Continue reading
In this post we will configure EVPN on NX-OS. We will reuse the VXLAN topology from my previous post. The following will describe the setup in this post:
The BGP topology is shown below:
I will cover all the details of configuring EVPN and establishing the BGP sessions. We will then cover the actual exchange of routes in detail in separate posts in the future.
Starting out, the following globals and features need to be configured:
Next, let’s configure BGP on the spines with the following settings:
Then let’s configure BGP on the leafs:
The devices will now advertise that they have AFI L2VPN and SAFI EVPN:
The BGP sessions are now up:
Leaf1# show bgp l2vpn evpn sum BGP summary information for VRF default, address family L2VPN EVPN BGP router identifier 192.0.2.3, local AS number 65000 BGP table version is 4, L2VPN EVPN config peers Continue reading
In previous posts I described VXLAN using flood and learn behavior using multicast or ingress replication. The drawback to flood and learn is that frames need to be flooded/replicated for the VTEPs to learn of each other and for learning what MAC addresses are available through each VTEP. This isn’t very efficient. Isn’t there a better way of learning this information? This is where Ethernet VPN (EVPN) comes into play. What is it? As you know, BGP can carry all sorts of information and EVPN is just BGP with support to carry information about VTEPs, MAC addresses, IP addresses, VRFs, and some other stuff. What does EVPN provide us?
Note that the use of EVPN doesn’t entirely remove the need for flooding using multicast or ingress replication. Hosts still need to use ARP/ND to find the MAC address of each other, although ARP suppression could potentially help with that. There may also be protocols such as DHCP that leverage broadcast for some messages. In addition, there may be silent hosts in the fabric where VTEP is not aware that the host is Continue reading