During Developer Week in April 2024, we announced General Availability of Workers AI, and today, we are excited to announce that AI Gateway is Generally Available as well. Since its launch to beta in September 2023 during Birthday Week, we’ve proxied over 500 million requests and are now prepared for you to use it in production.
AI Gateway is an AI ops platform that offers a unified interface for managing and scaling your generative AI workloads. At its core, it acts as a proxy between your service and your inference provider(s), regardless of where your model runs. With a single line of code, you can unlock a set of powerful features focused on performance, security, reliability, and observability – think of it as your control plane for your AI ops. And this is just the beginning – we have a roadmap full of exciting features planned for the near future, making AI Gateway the tool for any organization looking to get more out of their AI workloads.
The AI space moves fast, and it seems like every day there is a new model, provider, or framework. Given this high rate of Continue reading
In the previous two blog posts (Dealing with LAG Member Failures, LAG Member Failures in VXLAN Fabrics) we discovered that it’s almost trivial to deal with a LAG member failure in an MLAG cluster if we have a peer link between MLAG members. What about the holy grail of EVPN pundits: ESI-based MLAG with no peer link between MLAG members?
In the previous two blog posts (Dealing with LAG Member Failures, LAG Member Failures in VXLAN Fabrics) we discovered that it’s almost trivial to deal with a LAG member failure in an MLAG cluster if we have a peer link between MLAG members. What about the holy grail of EVPN pundits: ESI-based MLAG with no peer link between MLAG members?
There are a lot of options when it comes to vPC. What enhancements should you consider? I’ll go through some of the options worth considering.
Peer Switch – The Peer Switch feature changes how vPC behaves in regards to STP. Without this enabled, you would configure different STP priorities on the primary and secondary switch. The secondary switch forwards BPDUs coming from vPC-connected switches towards the primary switch. The secondary switch doesn’t process these received BPDUs. Only the primary switch sends BPDUs to the vPC-connected switches. Note that the secondary switch can process and send BPDUs to switches that are only connected to the secondary switch. Without Peer Switch it looks like this:
Calico, the leading solution for container networking and security, unveils a host of new features this spring. From new security capabilities that simplify operations, enhanced visualization for faster troubleshooting, and major enhancements to its popular workload-centric distributed WAF, Calico is set to redefine how you manage and secure your containerized workloads.
This blog describes the new capabilities in Calico.
Runtime threat detection generates a large number of security events. However, managing and analyzing these events can be challenging, and users need a way to summarize and navigate through them to gain deeper insights and take appropriate actions. Let’s see how Calico simplifies runtime security operations.
We are excited to announce the introduction of the Security Event Dashboard in Calico. This dashboard provides a summary of the security events generated by the runtime threat detection engine. With the Security Event Dashboard, users can easily analyze and pivot around the data, enabling them to:
The Security Event Dashboard offers a visually appealing and user-friendly interface, presenting key summarizations of Continue reading
I once heard a quote that said, “The hardest part of being a butcher is knowing where to cut.” If you’ve ever eaten a cut of meat you know that the difference between a tender steak and a piece of meat that needs hours of tenderizing is just inches apart. Butchers train for years to be able to make the right cuts in the right pieces of meat with speed and precision. There’s even an excellent Medium article about the dying art of butchering.
One thing that struck me in that article is how the art of butchering relates to AI. Yes, I know it’s a bit corny and not an easy segue into a technical topic but that transition is about as subtle as the way AI has come crashing through the door to take over every facet of our lives. It used to be that AI was some sci-fi term we used to describe intelligence emerging in computer systems. Now, AI is optimizing my PC searches and helping with image editing and creation. It’s easy, right?
Except some of those things that AI promises to excel at doing are things that professionals have spent years honing their Continue reading
Let’s open another juicy can of BGP worms: load balancing. In the first lab exercise, you’ll configure equal-cost load balancing across EBGP paths and tweak the “What is equal cost?” algorithm to consider just the AS path length, not the contents of the AS path.
Let’s open another juicy can of BGP worms: load balancing. In the first lab exercise, you’ll configure equal-cost load balancing across EBGP paths and tweak the “What is equal cost?” algorithm to consider just the AS path length, not the contents of the AS path.
Like I hinted at in an earlier post, there are a some failure scenarios you need to consider for vPC. The first scenario we can’t really do much with, but I’ll describe it anyway. The topology is the one below:
Server4 needs to send a packet to Server1. Leaf4 has the following routes for 198.51.100.11:
Leaf4# show bgp l2vpn evpn 198.51.100.11 BGP routing table information for VRF default, address family L2VPN EVPN Route Distinguisher: 192.0.2.3:32777 BGP routing table entry for [2]:[0]:[0]:[48]:[0050.56ad.8506]:[32]:[198.51.100.11]/272, version 13677 Paths: (2 available, best #2) Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW Path type: internal, path is valid, not best reason: Neighbor Address, no labeled nexthop AS-Path: NONE, path sourced internal to AS 203.0.113.12 (metric 81) from 192.0.2.12 (192.0.2.2) Origin IGP, MED not set, localpref 100, weight 0 Received label 10000 10001 Extcommunity: RT:65000:10000 RT:65000:10001 SOO:203.0.113.12:0 ENCAP:8 Router MAC:00ad.e688.1b08 Originator: 192.0.2.3 Cluster list: 192.0.2.2 Advertised path-id 1 Path type: internal, path is valid, is best path, Continue reading
In this blog post, let's look at a very simple Network CI/CD pipeline that manages my Containerlab network topology and configurations. We'll start with the benefits of using CI/CD, cover some basic terminology, and then go through an example.
To give an overview, I use Containerlab to deploy my network labs and Nornir to deploy the configurations. Before CI/CD, my typical workflow involves using containerlab
commands to manage the topology. Once the lab is up and running, I use Python to run the Nornir script. This works well because I'm the only one using it. However, I ideally want to put all the configurations into a Git repo to track my changes over time. I also want to test my code (to ensure there are no syntax errors, for example) and automatically push the updates to the devices.
Here is the project repo if you want to clone it and follow along.
CI/CD stands for Continuous Integration and Continuous Delivery. In simple terms, it means automatically testing and delivering your code. With Continuous Integration (CI), every time you make a change to your code, it's tested automatically Continue reading
Every complex enough network automation solution has to introduce a high-level (user-manageable) data model that is eventually transformed into a low-level (device) data model.
High-level overview of the process
The transformation code (business logic) is one of the most complex pieces of a network automation solution, and there’s only one way to ensure it works properly: you test the heck out of it ;) Let me show you how we solved that challenge in netlab.
Every complex enough network automation solution has to introduce a high-level (user-manageable) data model that is eventually transformed into a low-level (device) data model.
High-level overview of the process
The transformation code (business logic) is one of the most complex pieces of a network automation solution, and there’s only one way to ensure it works properly: you test the heck out of it ;) Let me show you how we solved that challenge in netlab.
Welcome to the world of Ansible magic! In this blog post, we're going to uncover the secrets of accessing host_vars
and group_vars
directly from Python scripts. These variables hold the keys to customizing your automation scripts, empowering you to unlock new levels of flexibility and efficiency in your infrastructure management.
Let’s dive in!
To do this, we’ll use the Ansible API. The Ansible API is a powerful tool that allows you to interact with Ansible programmatically. The documentation for the Ansible API can be found here.
My ansible project folder structure looks like this
.
├── ansible.cfg
├── ansible_pyapi.py
├── group_vars
│ ├── all.yaml
│ └── host1_2.yaml
├── host_vars
│ ├── host1.yml
│ ├── host2.yml
│ └── host3.yml
└── inventory.ini
folder structure
My inventory file looks like this:
host1
host2
host3
[host1_2]
host1
host2
[all:vars]
username= "username"
password= "password"
inventory.ini
Contents of host_vars and group_vars files are as follows:
host_vars_location: from host_vars/host1.yml
host_vars/host1.yml
host_vars_location: from host_vars/host2.yml
host_vars/host2.yml
host_vars_location: from host_vars/host3.yml
host_vars/host3.yml
all_group_vars: from group_vars/all.yaml
group_vars/all.yml
group_vars_location: from group_vars/host1_2.yaml
group_vars/host1_2.yml
The Median Isn’t the Message - Stephen Jay Gould
When we think of regression, the most common one, which we all know, is linear regression. It is a fairly popular and simple technique for estimating the mean of some variable conditional on the values of independent variables.
Now imagine if you are a grocery delivery or ride-hailing service and want to show the customer the estimated delivery or wait times. If the distance is smaller, there will be less variability in the waiting time, but if the distance is longer, many things can go wrong, and due to that there can be a lot of variability in the estimate time. If we have to create a model to predict that, we may not want to apply linear regression as that will only tell us the average time.
It’s important to note that one of the key assumptions for applying linear regression is a constant variance (Homoskedasticity). However, many times this is often not the case. The variability is not constant (Heteroscedastic), which violates the linear regression assumption (Linear Regression Notes).
Let’s look at a running data for the distance vs. the time it takes to finish. We clearly know Continue reading
TL/DR
Calico 3.28 now has enabled VXLAN checksum offload by default for environments with the kernel version of 5.8 or above. In the past, offloading was disabled due to kernel bugs.
Please keep in mind, if you are upgrading to 3.28 this change will take effect after node restarts.
If you encounter unexpected performance issues, you can use the following command to revert to the previous method by using the following command:
kubectl patch felixconfiguration default --type="merge" -p='{"spec":{"FeatureDetectOverride":"ChecksumOffloadBroken=true"}'
Please keep in mind that you can report any issues via GitHub tickets or Slack and include a detailed description of the environment (NIC hardware, kernel, distro, Continue reading