Toni Pasanen

Author Archives: Toni Pasanen

BGP EVPN Part IV: MAC-VRF L2RIB Update: Local MAC Address

In Figure 1-3 we have VLAN 10 mapped to EVI/MAC-VRF L2VNI10000. TS-A1 (IP: 192.168.11.12/MAC: 1000.0010.beef) is connected to VLAN10 via Attachment Circuit (AC) Ethernet 1/2, (ifindex: 1a000200). 

Figure 1-3: MAC-VRF: L2RIB Local Learning Process.


Example 1-1 shows the VLAN to L2VNI mapping information. 


Leaf-101# show vlan id 10 vn-segment
VLAN Segment-id
---- -----------
10   10000       

Example 1-1: VLAN to EVPN Instance Mapping Information.


Step-1 and 2: MAC Table Update 


During the startup process, TS-A1 sends a Gratuitous ARP (GARP) message to announce its presence on the network and validate the uniqueness of its IP address. It uses its IP address in the Target IP field (Example 1-2). If another host responds to this unsolicited ARP reply, it indicates a potential IP address conflict. 

Ethernet II, Src: 10:00:00:10:be:ef, Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (reply/gratuitous ARP)
    Hardware type: Ethernet (1)
    Protocol type: IPv4 (0x0800)
    Hardware size: 6
    Protocol size: 4
    Opcode: reply (2)
    [Is gratuitous: True]
    Sender MAC address: 10:00:00:10:be:ef (10:00:00:10:be:ef)
    Sender IP address: 192.168.11.12
    Target MAC address: Broadcast Continue reading

BGP EVPN Part III: BGP EVPN Local Learning Fundamentals

Multi-Protocol BGP (MP-BGP) is a BGP-4 extension that enables BGP speakers to encode Network Layer Reachability Information (NLRI) of various address types, such as IPv4/6, VPNv4, and MAC addresses, into BGP Update messages. MP-BGP features an MP_REACH_NLRI Path-Attribute (PA), which utilizes an Address Family Identifier (AFI) to describe service categories. Subsequent Address Family Identifier (SAFI), in turn, defines the solution used for providing the service. For example, L2VPN (AFI 25) is a primary category for Layer-2 VPN services, and the Ethernet Virtual Private Network (EVPN: SAFI 70) provides the service. Another L2VPN service is Virtual Private LAN Service (VPLS: SAFI 65). The main differences between these two L2VPN services are that only EVPN supports active/active multihoming, has a control-plane-based MAC address learning mechanism, and operates over an IP-routed infrastructure.

EVPN utilizes various Route Types (EVPN RT) to describe the Network Layer Reachability Information (NLRI) associated with Unicast, BUM (Broadcast, Unknown unicast, and Multicast) traffic, as well as ESI Multihoming. The following sections explain how EVPN RT 2 (MAC Advertisement Route) is employed to distribute MAC and IP address information of Tenant Systems enabling the expansion of VLAN over routed infrastructure. 

The Tenant System refers to a host, virtual machine, Continue reading

BGP EVPN Part II: Network Virtualization Overlay with BGP EVPN and VXLAN – Introduction

In Figure 1-1, we have a routed 3-stage Clos Fabric, where all Inter-Switch links are routed point-to-point layer-3 connections. As explained in previous sections, a switched layer-2 network with an STP control plane allows only one active path per VLAN/Instance and VLAN-based traffic load sharing. Due to the Equal Cost Multi-Path (ECMP) supported by routing protocols, a routed Clos Fabric enables flow-based traffic load balancing using all links from the ingress leaf via the spine layer down to the egress leaf. The convergence time for routing protocols is faster and less disruptive than STP topology change. Besides, a routed Clos Fabric architecture allows horizontal bandwidth scaling. We can increase the overall bandwidth capacity between switches, by adding a new spine switch. Dynamic routing protocols allow standalone and virtualized devices lossless In-Service Software Update (ISSU) by advertising infinite metrics or withdrawing all advertised routes.

But how do we stretch layer-2 segments over layer-3 infrastructure in a Multipoint-to-Multipoint manner, allowing tenant isolation and routing between segments? The answer relies on the Network Virtualization Overlay (NVO3) framework. 

BGP EVPN, as an NVO3 control plane protocol, uses EVPN Route Types (RT) in update messages for identifying the type of advertised EVPN NLRIs (Network Continue reading

BGP EVPN Part-I: Challenges in Traditional Switched Datacenter Networks

Inefficient Link Utilization

The default Layer 2 Control Plane protocol in Cisco NX-OS is a Rapid Per-VLAN Spanning Tree Plus (Rapid PVST+), which runs 802.1w standard Rapid Spanning Tree Protocol (RSTP) instance per VLAN. Rapid PVST+ builds a VLAN-specific, loop-free Layer 2 data path from the STP root switch to all non-root switches. Spanning Tree Protocol, no matter which mode we use, allows only one active path at a time and blocks all redundant links. One general solution for activating all Inter-switch links is placing an STP root switch for odd and even VLANs into different switches. However, STP allows only a VLAN-based traffic load balancing.


CPU and Memory Usage

After building a loop-free data path, switches running Rapid PVST+ monitor the state of the network by using Spanning Tree instance-based Bridge Protocol Data Units (BPDU). By default, each switch sends instance-based BPDU messages from their designated port in two-second intervals. If we have 2000 VLANs, all switches must process 2000 BPDUs. To reduce CPU and Memory consumption caused by BPDU processing, we can use Multiple Spanning Tree – MSTP (802.1s), where VLANs are associated with Instances. For example, we can attach VLANs 1-999 to one instance and Continue reading

Cisco Intent-Based Networking: Part II – Cisco ISE and Catalyst Center Migration

Cisco Identity Service Engine (ISE) and Catalyst Center Integration

Before you can add Cisco ISE to Catalyst Center’s global network settings as an Authentication, Authorization, and Accounting server (AAA) for clients and manage the Group-Based access policy implemented in Cisco ISE, you must integrate them. 

This post starts by explaining how to activate the pxGrid service on ISE, which it uses for pushing policy changes to Catalyst Center (steps 1a-f). Next, it illustrates the procedure to enable  External RESTful API (ERS) read/write on Cisco ISE to allow external clients to Create, Read, Update, and Delete (CRUD) processes on ISE. Catalyst Center uses ERS for pushing configuration to ISE. After starting the pxGrid service and enabling ERS, this post discusses how to initiate the connection between ISE and Catalyst Center (steps 2a-h and 3a-b). The last part depicts the Group-Based Access Control migration processes (4a-b).

Step-1: Start pxGrid Service and Enabling ERS on ISE

Open the Administrator tab on the main view of Cisco ISE. Then, under the System tab, select the Deployment option. The Deployment Nodes section displays the Cisco ISE Node along with its personas. In Figure 1-3, a standalone ISE Node is comprised of three personas: Policy Continue reading

Cisco Intent-Based Networking: Part I – Introduction

 Introduction

This chapter introduces Cisco's approach to Intent-based Networking (IBN) through their Centralized SDN Controller, Cisco DNA Center, rebranded as Cisco Catalyst Center (from now on, I am using the abbreviation C3 for Cisco Catalyst Center). We focus on the network green field installation, showing workflows, configuration parameters, and relationships and dependencies between building blocks. The C3 workflow is divided into four main entities: 1) Design, 2) Policy, 3) Provision, and 4) Assurance, each having its own sub-processes. This chapter introduces the Design phase focusing on Network Hierarchy, Network Settings, and Network Profile with Configuration Templates. 

This post deprecates the previous post, "Cisco Intent-Based Networking: Part I, Overview."

Network Hierarchy

Network Hierarchy is a logical structure for organizing network devices. At the root of this hierarchy is the Global Area, where you establish your desired network structure. In our example, the hierarchy consists of four layers: Area (country - Finland), Sub-area (city - Joensuu), Building (JNS01), and Floor (JNS01-FLR01). Areas and Buildings indicate the location, while Floors provide environmental information relevant to wireless networks, such as floor type, measurements, and wall properties.


Network Settings

Network settings define device credentials (CLI, HTTP(S), SNMP, and NETCONF) required for accessing devices Continue reading

Cisco Intent Based Networking: Part I, Overview

This post introduces Cisco's approach to Intent-based Networking (IBN) through their Centralized SDN Controller, DNA Center, rebranded as Catalyst Center. We focus on the network green field installation, showing workflows, configuration parameters, and relationships and dependencies between building blocks.

Figure 1-1 is divided into three main areas: a) Onboard and Provisioning, b) Network Hierarchy and Global Network Settings, c) and Configuration Templates and Site Profiles. 

We start a green field network deployment by creating a Network Design. In this phase, we first build a Network Hierarchy for our sites. For example, a hierarchy can define Continent/Country/City/Building/Floor structure. Then, we configure global Network Settings. This phase includes both Network and Device Credentials configuration. AAA, DHCP, DNS serves, DNS name, and Time Zone, which are automatically inherited throughout the hierarchy, are part of the Network portion. Device Credentials, in turn, define CLI, SNMP read/write, HTTP(S) read/write username/password, and CLI enable password. The credentials are used later in the Discovery phase.

Next, we build a site and device type-specific configuration templates. As a first step, we create a Project, a folder for our templates. In Figure 1-1, we have a Composite template into which we attach two Regular templates. Regular templates include Continue reading

The Network Times 2023-08-27 09:59:00

 

Available at Leanpub and Amazon


About This Book

A modern application typically comprises several modules, each assigned specific roles and responsibilities within the system. Application architecture governs the interactions and communications between these modules and users. One prevalent architecture is the three-tier architecture, encompassing the Presentation, Application, and Data tiers. This book explains how you can build a secure and scalable networking environment for your applications running in Microsoft Azure. Besides a basic introduction to Microsoft Azure, the book explains various solutions for Virtual Machines Internet Access, connectivity, security, and scalability perspectives.


Azure Basics: You will learn the hierarchy of Microsoft Azure datacenters, i.e., how a group of physical datacenters forms an Availability Zone within the Azure Region. Besides, you learn how to create a Virtual Network (VNet), divide it into subnets, and deploy Virtual Machines (VM). You will also learn how the subnet in Azure differs from the subnet in traditional networks.


Internet Access: Depending on the role of the application, VMs have different Internet access requirements. Typically, front-end VMs in the presentation tier/DMZ are visible on the Internet, allowing external hosts to initiate connections. VMs in the Application and Data tiers are rarely accessible from Continue reading

NVA Part V: NVA Redundancy with Azure Internal Load Balancer – On-Prem Connec

 Introduction


In Chapter Five, we deployed an internal load balancer (ILB) in the vnet-hub. It was attached to the subnet 10.0.0.0/24, where it obtained the frontend IP (FIP) 10.0.1.6. Next, we created a backend pool and associated our NVAs with it. Finally, we bound the frontend IP 10.0.1.6 to the backend pool to complete the ILB setup.


Next, in vnet-spoke1, we created a route table called rt-spoke1. This route table contained a user-defined route (UDR) for 10.2.0.0/24 (vnet-spoke2) with the next-hop set as 10.0.1.6. We attached this route table to the subnet 10.1.0.0/24. Similarly, in vnet-spoke2, we implemented a user-defined route for 10.1.0.0/24 (vnet-spoke1). By configuring these UDRs, we ensured that the spoke-to-spoke traffic would pass through the ILB and one of the NVAs on vnet-hub. Note that in this design, the Virtual Network Gateway is not required for spoke-to-spoke traffic.


In this chapter, we will add a Virtual Network Gateway (VGW) into the topology and establish an IPsec VPN connection between the on-premises network edge router and VGW. Additionally, we will deploy a new route table called "rt-gw-snet" where we add routing entries to the spoke VNets with the next-hop IP address 10.0.1.6 (ILB's frontend IP). Besides, we will add a routing entry 10.3.0.0/16 > 10.0.1.6 into the existing route tables on vnet-spoke-1 and vnet-spoke-2 (not shown in figure 6-1). This configuration will ensure that the spoke to spoke and spoke to on-prem flows are directed through one of the Network Virtual Appliances (NVAs) via ILB. The NVAs use the default route table, where the VGW propagates all the routes learned from VPN peers. However, we do not propagate routes from the default route table into the "rt-gw-snet" and "rt-prod-1" route tables. To enable the spoke VNets to use the VGW on the hub VNet, we allow it in VNet peering configurations.


  1. The administrator of the mgmt-pc opens an SSH session to vm-prod-1. The connection initiation begins with the TCP three-way handshake. The TCP SYN message is transmitted over the VPN connection to the Virtual Gateway (VGW) located on the vnet-hub. Upon receiving the message, the VGW first decrypts it and performs a routing lookup. The destination IP address, 10.1.0.4, matches the highlighted routing entry in the route table rt-gw-snet.
  2. The VGW determines the location (the IP address of the hosting server) of 10.1.0.6, encapsulates the message with tunnel headers, and forwards it to an Internal Load Balancer (ILB) using the destination IP address 10.1.0.6 in the tunnel header.
  3. The Internal Load Balancer receives the TCP SYN message. As the destination IP address in the tunnel header matches one of its frontend IPs, the ILB decapsulates the packet. It then checks which backend pool (BEP) is associated with the frontend IP (FIP) 10.0.1.6 to determine to which VMs it can forward the TCP SYN message. Using a hash algorithm (in our example, the 5-tuple), the ILB selects a VM from the backend pool members, in this case, NVA2. The ILB performs a location lookup for the IP address 10.1.0.5, encapsulates the TCP SYN message with tunnel headers, and finally sends it to NVA2.
  4. The message reaches the hosting server of NVA2, which removes the encapsulation since the destination IP in the tunnel header belongs to itself. Based on the Syn flag set in the TCP header, the packet is identified as the first packet of the flow. Since this is the initial packet of the flow, there is no flow entry programmed into the Generic Flow Table (GFT) related to this connection. The parser component generates a metadata file from the L3 and L4 headers of the message, which then is processed by the Virtual Filtering Platform (VFP) layers associated with NVA2. Following the VFP processing, the TCP SYN message is passed to NVA2, and the GFT is updated with flow information and associated actions (Allow and Encapsulation instructions). Besides, the VFP process creates a corresponding entry for the return packets into the GFT (reversed source and destination IPs and ports). Please refer to the first chapter for more detailed information on VFP processes.
  5. We do not have any pre-routing or post-routing policies configured on either NVA. As a result, NVA2 simply routes the traffic out of the eth0 interface based on its routing table. The ingress TCP SYN message has already been processed by the VFP layers, and the GFT has been updated accordingly. Consequently, the egress packet can be forwarded based on the GFT without the need for additional processing by the VFP layers.
  6. Subsequently, the encapsulated TCP SYN message is transmitted over VNet peering to vm-prod-1, located on vnet-spoke-1. Upon reaching the hosting server of vm-prod-1, the packet is processed in a similar manner as we observed with NVA. The encapsulation is removed, and the packet undergoes the same VFP processing steps as before.


Figure 6-1: ILB Example Topology.

Continue reading

NVA Part IV: NVA Redundancy with Azure Internal Load Balancer

Introduction

To achieve active/active redundancy for a Network Virtual Appliance (NVA) in a Hub-and-Spoke VNet design, we can utilize an Internal Load Balancer (ILB) to enable Spoke-to-Spoke traffic.

Figure 5-1 illustrates our example topology, which consists of a vnet-hub and spoke VNets. The ILB is associated with the subnet 10.0.1.0/24, where we allocate a Frontend IP address (FIP) using dynamic or static methods. Unlike a public load balancer's inbound rules, we can choose the High-Availability (HA) ports option to load balance all TCP and UDP flows. The backend pool and health probe configurations remain the same as those used with a Public Load Balancer (PLB).

From the NVA perspective, the configuration is straightforward. We enable IP forwarding in the Linux kernel and virtual NIC but not pre-routing (destination NAT). We can use Post-routing policies (source NAT) if we want to hide real IP addresses or if symmetric traffic paths are required. To route egress traffic from spoke sites to the NVAs via the ILB, we create subnet-specific route tables in the spoke VNets. The reason why the "rt-spoke1" route table has an entry "10.2.0.0/24 > 10.0.1.6 (ILB)" is that vm-prod-1 has a public IP address used for external access. If we were to set the default route, as we have in the subnet 10.2.0.0/24 in "vnet-spoke2", the external connection would fail.

Figure 5-1: ILB Example Topology.

NVA Part III: NVA Redundancy – Connection from the Internet

This chapter is the first part of a series on Azure's highly available Network Virtual Appliance (NVA) solutions. It explains how we can use load balancers to achieve active/active NVA redundancy for connections initiated from the Internet.

In Figure 4-1, Virtual Machine (VM) vm-prod-1 uses the load balancer's Frontend IP address 20.240.9.27 to publish an application (SSH connection) to the Internet. Vm-prod-1 is located behind an active/active NVA FW cluster. Vm-prod-1 and NVAs have vNICs attached to the subnet 10.0.2.0/24.

Both NVAs have identical Pre- and Post-routing policies. If the ingress packet's destination IP address is 20.240.9.27 (load balancer's Frontend IP) and the transport layer protocol is TCP, the policy changes the destination IP address to 10.0.2.6 (vm-prod-1). Additionally, before routing the packet through the Ethernet 1 interface, the Post-routing policy replaces the original source IP with the IP address of the egress interface Eth1.

The second vNICs of the NVAs are connected to the subnet 10.0.1.0/24. We have associated these vNICs with the load balancer's backend pool. The Inbound rule binds the Frontend IP address to the Backend pool and defines the load-sharing policies. In our example, the packets of SSH connections from the remote host to the Frontend IP are distributed between NVA1 and NVA2. Moreover, an Inbound rule determines the Health Probe policy associated with the Inbound rule.

Note! Using a single VNet design eliminates the need to define static routes in the subnet-specific route table and the VM's Linux kernel. This solution is suitable for small-scale implementations. However, the Hub-and-Spoke VNet topology offers simplified network management, enhanced security, scalability, performance, and hybrid connectivity. I will explain how to achieve NVA redundancy in the Hub-and-Spoke VNet topology in upcoming chapters.



Figure 4-1: Example Diagram. 

NVA Part II – Internet Access with a single NVA

Introduction

In the previous chapter, you learned how to route east-west traffic through the Network Virtual Appliance (NVA) using subnet-specific route tables with User Defined Routes (UDR). This chapter introduces how to route north-south traffic between the Internet and your Azure Virtual Network through the NVA.

Figure 3-1 depicts our VNet setup, which includes DMZ and Web Tier zones. The NVA, vm-nva-fw, is connected to subnet snet-north (10.0.2.0/24) in the DMZ via a vNIC with Direct IP (DIP) 10.0.2.4. We've also assigned a public IP address, 51.12.90.63, to this vNIC. The second vNIC is connected to subnet snet-west (10.0.0.0/24) in the Web Tier, with DIP 10.0.0.5. We have enabled IP Forwarding in both vNICs and Linux kernel. We are using Network Security Groups (NSGs) for filtering north-south traffic.

Our web server, vm-west, has a vNIC with DIP 10.0.0.4 that is connected to the subnet snet-west in the Web Tier. We have associated the route table to the subnet with the UDR, which forwards traffic to destination IP 141.192.166.81 (remote host) to NVA. To publish the web server to the internet, we've used the public IP of NVA. 

On the NVA, we have configured a Destination NAT rule which rewrites the destination IP address to 10.0.0.4 to packets with the source IP address 141.192.166.81 and protocol ICMP. To simulate an http connection, we're using ICMP requests from a remote host.


Figure 3-1: Example Diagram and.

Continue reading

Routing in Azure Subnets

Introduction

Subnets, aka Virtual Local Area Networks (VLANs) in traditional networking, are Layer-2 broadcast domains that enable attached workloads to communicate without crossing a Layer-3 boundary, the subnet Gateway. Hosts sharing the same subnet resolve each other’s MAC-IP address binding using Address Resolution Protocol, which relays on Broadcast messages. That is why we often use the Failure domain definition with subnets. We can spread subnets between physical devices over Layer-2 links using VLAN tagging, defined in the IEEE 802.1Q standard. Besides, tunnel encapsulation solutions supporting tenant/context identifier enables us to extend subnets over Layer-3 infrastructure. Virtual eXtensible LAN (VXLAN) using VXLAN Network Identifier (VNI) and Network Virtualization using Generic Route Encapsulation (NVGRE) using Tenant Network ID (TNI) are examples of Network Virtualization Over Layer 3 (NVO) solutions. If you have to spread the subnet over MPLS enabled network, you can choose to implement Virtual Private LAN (VPLS) Service or Virtual Private Wire Service (VPWS), among the other solutions.  

In Azure, the concept of a subnet is different. You can think about it as a logical domain within a Virtual Network (VNet), where attached VMs share the same IP address space and use the same shared routing policies. Broadcast and Multicast traffic is not natively supported in Azure VNet. However, you can use a cloudSwXtch VM image from swXtch.io to build a Multicast-enabled overlay network within VNet. 

Default Routing in Virtual Network

This section demonstrates how the routing between subnets within the same Virtual Network (VNet) works by default. Figure 2-1 illustrates our example Azure VNet setup where we have deployed two subnets. The interface eth0 of vm-west and interface eth1 of vm-nva-fw are attached to subnet snet-west (10.0.0.0/24), while interface eth2 of vm-nva-fw and interface eth0 of vm-west is connected to subnet snet-east (10.0.1.0/24). All three VMs use the VNet default routing policy, which routes Intra-VNet data flows directly between the source and destination endpoint, regardless of which subnets they are connected to. Besides, the Network Security Groups (NSGs) associated with vNICs share the same default security policies, which allow inbound and outbound Intra-VNet data flows, InBound flows from the Load Balancer, and OutBound Internet connections. 

Now let’s look at what happens when vm-west (DIP: 10.0.0.4) pings vm-west (DIP: 10.0.1.4), recapping the operation of VFP. Note that Accelerated Networking (AccelNet) is enabled in neither VMs.

  1. The VM vm-west sends an ICMP Request message to vm-east. The packet arrives at the Virtual Filtering Platform (VFP) for processing. Since this is the first packet of the flow, the Flow Identifier and associated Actions are not in the Unified Flow Table (UFT). The Parser component extracts the 5-tuple header information (source IP, source port, destination IP, destination port, and transport protocol) as metadata from the original packet. The metadata is then processed in each VFP layer to generate a flow-based entry in the UFT.
  2. The destination IP address matches the Network Security Group's (NSG) default outbound rule, which allows Intra-VNet flows. Then the metadata is passed on to the routing process. Since we haven't yet deployed subnet-specific route tables, the result of the next-hop route lookup is 3.3.3.3, the Provider Address (PA) of Host-C.
  3. Intra-VNet connections use private IP addresses (DIP-Direct IP), and the VFP process bypasses the NAT layer. The VNet layer, responsible for encapsulation/decapsulation, constructs tunnel headers (IP/UDP/VXLAN). It creates the outer IP address with the source IP 1.1.1.1 (Host-A) and destination IP 3.3.3.3 (Host-C), resolved by the Routing layer. Besides, it adds Virtual Network Identifier (VNI) into the VXLAN header.
  4. After each layer has processed the metadata, the result is encoded to Unified Flow Table (UFT) with Flow-Id with push action (Encapsulation). 
  5. The Header Transposition engine (HT) modifies the original packet based on the UFT actions. It adds tunnel headers leaving all original header information intact. Finally, the modified packet is transmitted to the upstream switch. The subsequent packets are forwarded based on the UFT.
  6. The Azure switching infra forwards the packet based on the destination IP address on the outer IP header (tunnel header).
  7. The VFP on Host-C processes the ingress ICMP Request message in the same manner as VFP in Host-A but in reversed order starting with decapsulation in the VNet layer.

Figure 2-1: Example Topology Diagram.
Continue reading

Chapter 1: Azure VM networking – Virtual Filtering Platform and Accelerated Networking

 Note! This post is under the technical review

Introduction


Virtual Filtering Platform (VFP) is Microsoft’s cloud-scale software switch operating as a virtual forwarding extension within a Hyper-V basic vSwitch. The forwarding logic of the VFP uses a layered policy model based on policy rules on Match-Action Table (MAT). VFP works on a data plane, while complex control plane operations are handed over to centralized control systems. The VFP includes several layers, including VNET, NAT, ACL, and Metering layers, each with dedicated controllers that program policy rules to the MAT using southbound APIs. The first packet of the inbound/outbound data flow is processed by VFP. The process updates match-action table entries in each layer, which then are copied into the Unified Flow Table (UFT). Subsequent packets are then switched based on the flow-based action in UFT. However, if the Virtual Machine is not using Accelerated Networking (AccelNet), all packets are still forwarded over the software switch, which requires CPU cycles. Accelerated Networking reduces the host’s CPU burden and provides a higher packet rate with a more predictable jitter by switching the packet using hardware NIC yet still relaying to VFP from the traffic policy perspective.


Hyper-V Extensible Virtual Switch


Microsoft’s extensible vSwitch running on Hyper-V operates as a Networking Virtualization Service Provider (NetVSP) for Virtual Machine. VMs, in turn, are Network Virtualization Service Consumers (NetVSP). When a VM starts, it requests the Hyper-V virtualization stack to connect to the vSwitch. The virtualization stack creates a virtual Network Interface (vNIC) for the VM and associates it with the vSwitch. The vNIC is presented to the VM as a physical network adapter. The communication channel between VM and vSwitch uses a synthetic data path Virtual Machine Bus (VMBus), which provides a standardized interface for VMs to access physical resources on the host machine. It helps ensure that virtual machines have consistent performance and can access resources in a secure and isolated manner. 


Virtual Filtering Platform - VFP


A Virtual Filtering Platform (VFP) is Microsoft’s cloud-scale virtual switch operating as a virtual forwarding extension within a Hyper-V basic vSwitch. VFP sits in the data path between virtual ports facing the virtual machines and default vPort associated with physical NIC. VFP uses VM’s vPort-specific layers for filtering traffic to and from VM. A layer in the VFP is a Match-Action Table (MAT) containing policy rules programmed by independent, centralized controllers. The packet is processed through the VFP layers if it’s an exception packet, i.e., no Unified Flow entry (UF) in the Unified Flow Table (UFT), or if it’s the first packet of the flow (TCP SYN packet). When a Virtual Machine initiates a new connection, the first packet of the data flow is stored in the Received Queue (RxQ). The Parser component on VFP then takes the L2 (Ethernet), L3 (IP), and L4 (Protocol) header information as metadata, which is then processed through the layer policies in each VFP layer. The VFP layers involved in packet processing depend on the flow destination and the Azure services associated with the source/destination VM. 

VNET-to-Internet traffic from with VM using a Public IP


The metering layer measures traffic for billing. It is the first layer for VM’s outgoing traffic and the last layer for incoming traffic, i.e., it processes only the original ingress/egress packets ignoring tunnel headers and other header modifications (Azure does not charge you for overhead bytes caused by the tunnel encapsulation). Next, the ACL layer runs the metadata through the NSG policy statements. If the source/destination IP addresses (L3 header group) and protocol, source/destination ports (L4 header group) match one of the allowing policy rules, the traffic is permitted (action#1: Allow). After ACL layer processing, the routing process intercepts the metadata. Because the destination IP address in the L3 header group matches only with the default route (0.0.0.0/0, next-hop Internet), the metadata is handed over to Server Load Balancing/Network Address Translation (SLB/NAT) layer. In this example, a public IP is associated with VM’s vNIC, so the SLB/NAT layer translates the private source IP to the public IP (action#2: Source NAT). The VNet layer is bypassed if both source and destination IP addresses are from the public IP space. When the metadata is processed by each layer, the results are programmed into the Unified Flow Table (UFT). Each flow is identified with a unique Unified Flow Identifier (UFID) - hash value calculated from the flow-based 5-tuple (source/destination IP, Protocol, Source Port, Destination Port). The UFID is also associated with the actions Allow and Source NAT. The Header Transposition (HT) engine then takes the original packet from the RxQ and modifies its L2/L3/L4 header groups as described in the UFT. It changes the source private IP to public IP (Modify) and moves the packet to TxQ. The subsequent packets of the flow are modified by the HT engine based on the existing UFT entry without running related metadata through the VFP layers (slow-path to fast-path switchover). 

Besides the outbound flow entry, the VFP layer processes generate an inbound flow entry for the same connection but with reversed 5-tuple (source/destination addresses and protocol ports in reversed order) and actions (destination NAT instead of source NAT). These outbound and inbound flows are then paired and seen as a connection, enabling the Flow State Tracking process where inactive connections can be deleted from the UFT. For example, the Flow State Machine tracks the TCP RST flags. Let’s say that the destination endpoint sets the TCP RST flags to the L4 header. The TCP state machine notices it and removes the inbound flow together with its paired outbound flow from the UFT. Besides, the TCP state machine tracks the TCP FIN/FIN ACK flags and TIME_WAIT state (after TCP FIN. The connection is kept alive for max. 2 x Max Segment Lifetime to wait if there are delayed/retransmitted packets).


Intra-VNet traffic



The Metering and ACL layers on VFP process inbound/outbound flows for Intra-VNet connections in the same manner as VNet-Internet traffic. When the routing process notices that the destination Direct IP address (Customer Address space) is within the VNet CIDR range, the NAT layer is bypassed. The reason is that Intra-VNet flows use private Direct IP addresses as source and destination addresses. The Host Agent responsible for VNet layer operations, then examines the destination IP address from the L3 header group. Because this is the first packet of the flow, there is no information about the destination DIP-to-physical host mapping (location information) in the cache table. The VNet layer is responsible for providing tunnel headers to Intra-VNet traffic, so the Host Agent requests the location information from the centralized control plane. After getting the reply, it creates a MAT entry where the action part defines tunnel headers (push action). After the metadata is processed, the result is programmed into Unified Flow Table. As a result, the Header Transposition engine takes the original packet from the Received Queue, adds a tunnel header, and moves the packet to Transmit Queue.

Figure 1-1: Azure Host-Based SDN Building Blocks.

Continue reading

Azure Networking Fundamentals: Virtual WAN Part 2 – VNet Segmentation

VNets and VPN/ExpressRoute connections are associated with vHub’s Default Route Table, which allows both VNet-to-VNet and VNet-to-Remote Site IP connectivity. This chapter explains how we can isolate vnet-swe3 from vnet-swe1 and vnet-swe2 using VNet-specific vHub Route Tables (RT), still allowing VNet-to-VPN Site connection. As a first step, we create a Route Table rt-swe12 to which we associate VNets vnet-swe1 and vnet-swe2. Next, we deploy a Route Table rt-swe3 for vnet-swe3. Then we propagate routes from these RTs to Default RT but not from rt-swe12 to rt-swe3 and vice versa. Our VPN Gateway is associated with the Default RT, and the route to remote site subnet 10.11.11.0/24 is installed into the Default RT. To achieve bi-directional IP connectivity, we also propagate routes from the Default RT to rt-swe-12 and rt-swe3. As the last step, we verify both Control Plane operation and Data Plane connections. 


Figure 12-1: Virtual Network Segmentation.

Azure Networking Fundamentals: Virtual WAN Part 1 – S2S VPN and VNet Connections

 This chapter introduces Azure Virtual WAN (vWAN) service. It offers a single deployment, management, and monitoring pane for connectivity services such as Inter-VNet, Site-to-Site VPN, and Express Route. In this chapter, we are focusing on S2S VPN and VNet connections. The Site-to-Site VPN solutions in vWAN differ from the traditional model, where we create resources as an individual components. In this solution, we only deploy a vWAN resource and manage everything else through its management view. Figure 11-1 illustrates our example topology and deployment order. The first step is to implement a vWAN resource. Then we deploy a vHub. It is an Azure-managed VNet to which we assign a CIDR, just like we do with the traditional VNet. We can deploy a vHub as an empty VNet without associating any connection. A vHub deployment process launches a pair of redundant routers, which exchange reachability information with the VNet Gateway router and VGW instances using BGP. We intend to allow Inter-VNet data flows between vnet-swe1, vnet-swe2, and Branch-to-VNet traffic. For Site-to-Site VPN, we deploy VPN Gateway (VGW) into vHub. The VGW started in the vHub creates two instances, instance0, and instance1, in active/active mode. We don’t deploy a GatewaySubnet for VGW Continue reading

Azure Networking Fundamentals: VNET Peering

Comment: Here is a part of the introduction section of the eight chapter of my Azure Networking Fundamentals book. I will also publish other chapters' introduction sections soon so you can see if the book is for you. The book is available at Leanpub and Amazon (links on the right pane).

This chapter introduces an Azure VNet Peering solution. VNet peering creates bidirectional IP connections between peered VNets. VNet peering links can be established within and across Azure regions and between VNets under the different Azure subscriptions or tenants. The unencrypted data path over peer links stays within Azure's private infrastructure. Consider a software-level solution (or use VGW) if your security policy requires data path encryption. There is no bandwidth limitation in VNet Peering like in VGW, where BW is based on SKU. From the VM perspective, VNet peering gives seamless network performance (bandwidth, latency, delay, and jitter) for Inter-VNet and Intra-VNet traffic. Unlike the VGW solution, VNet peering is a non-transitive solution, the routing information learned from one VNet peer is not advertised to another VNet peer. However, we can permit peered VNets (Spokes) to use local VGW (Hub) and route Spoke-to-Spoke data by using a subnet-specific route table Continue reading

Azure Networking Fundamentals: Site-to-Site VPN

Comment: Here is a part of the introduction section of the fifth chapter of my Azure Networking Fundamentals book. I will also publish other chapters' introduction sections soon so you can see if the book is for you. The book is available at Leanpub and Amazon (links on the right pane).

A Hybrid Cloud is a model where we split application-specific workloads across the public and private clouds. This chapter introduces Azure's hybrid cloud solution using Site-to-Site (S2S) Active-Standby VPN connection between Azure and on-prem DC. Azure S2S A/S VPN service includes five Azure resources. The first one, Virtual Network Gateway (VGW), also called VPN Gateway, consists of two VMs, one in active mode and the other in standby mode. These VMs are our VPN connection termination points on the Azure side, which encrypt and decrypt data traffic. The active VM has a public IP address associated with its Internet side. If the active VM fails, the standby VM takes the active role, and the public IP is associated with it. Active and standby VMs are attached to the special subnet called Gateway Subnet. The name of the gateway subnet has to be GatewaySubnet. The Local Gateway (LGW) Continue reading

Azure Networking Fundamentals: Internet Access with VM-Specific Public IP

Comment: Here is a part of the introduction section of the Third chapter of my Azure Networking Fundamentals book. I will also publish other chapters' introduction sections soon so you can see if the book is for you. The book is available at Leanpub and Amazon (links on the right pane).

In chapter two, we created a VM vm-Bastion and associated a Public IP address to its attached NIC vm-bastion154. The Public IP addresses associated with VM’s NIC are called Instance Level Public IP (ILPIP). Then we added a security rule to the existing NSG vm-Bastion-nsg, which allows an inbound SSH connection from the external host. Besides, we created VMs vm-front-1 and vm-Back-1 without public IP address association. However, these two VMs have an egress Internet connection because Azure assigns Outbound Access IP (OPIP) addresses for VMs for which we haven’t allocated an ILPIP (vm-Front-1: 20.240.48.199 and vm-Back-1-20.240.41.145). The Azure portal does not list these IP addresses in the Azure portal VM view. Note that neither user-defined nor Azure-allocated Public IP addresses are not configured as NIC addresses. Instead, Azure adds them as a One-to-One entry to the NAT table (chapter 15 introduces a Continue reading

Azure Networking Fundamentals: Network Security Group (NSG)

Comment: Here is a part of the introduction section of the second chapter of my Azure Networking Fundamentals book. I will also publish other chapters' introduction sections soon so you can see if the book is for you. The book is available at Leanpub and Amazon (links on the right pane). 

This chapter introduces three NSG scenarios. The first example explains the NSG-NIC association. In this section, we create a VM that acts as a Bastion host*). Instead of using the Azure Bastion service, we deploy a custom-made vm-Bastion to snet-dmz and allow SSH connection from the external network. The second example describes the NSG-Subnet association. In this section, we launch vm-Front-1 in the front-end subnet. Then we deploy an NSG that allows SSH connection from the Bastion host IP address. The last part of the chapter introduces an Application Security Group (ASG), which we are using to form a logical VM group. We can then use the ASG as a destination in the security rule in NSG. There are two ASGs in figure 2-1. We can create a logical group of VMs by associating them with the same Application Security Group (ASG). The ASG can then be used Continue reading