Reza Ramezanpour, Author at NetworkingNexus.net

Reza Ramezanpour

Author Archives: Reza Ramezanpour

How 1&1 Mail & Media Scaled Kubernetes Networking with eBPF and Calico

“We started in 2017 with Calico and never regretted it!”
—Stefan Fudeus, Product Owner/Lead Architect, 1&1 Mail & Media

Challenge

1&1 Mail & Media, part of the IONOS group, powers popular European internet brands including GMX and Web.de, serving more than 50% of Germany’s population with critical identity and email infrastructure. With roughly 45 to 50 million users, network reliability is non-negotiable. Any downtime could affect millions.

By 2022, the company had containerized 80% of its workloads on Kubernetes across three self-managed data centers. While the platform, backed by bare metal nodes and custom network layers, was highly scalable, network throughput bottlenecks began to emerge. Pods were limited to 2.5 Gbps of bandwidth due to IP encapsulation overhead, despite 10 Gbps network interfaces.

The team needed a solution that:

Improved pod-to-pod network performance
Maintained strong network policy isolation across up to 40 tenants per cluster
Scaled to millions of network connections and 1.4 million HTTP requests per second

Solution

1&1 Mail & Media had adopted Calico back in 2017, largely for its unique Kubernetes NetworkPolicy standard support. As their Kubernetes platform evolved, with clusters scaling to 300 bare metal nodes, 16,000 pods, and over 4 million Continue reading

Top 5 Kubernetes Network Issues You Can Catch Early with Calico Whisker

Kubernetes networking is deceptively simple on the surface, until it breaks, silently leaks data, or opens the door to a full-cluster compromise. As modern workloads become more distributed and ephemeral, traditional logging and metrics just can’t keep up with the complexity of cloud-native traffic flows.

That’s where Calico Whisker comes in. Whisker is a lightweight Kubernetes-native observability tool created by Tigera. It offers deep insights into real-time traffic flow patterns, without requiring you to deploy heavyweight service meshes or packet sniffer. And here’s something you won’t get anywhere else: Whisker is data plane-agnostic. Whether you run Calico eBPF data plane, nftables, or iptables, you’ll get the same high-fidelity flow logs with consistent fields, format, and visibility. You don’t have to change your data plane, Whisker fits right in and shows you the truth, everywhere.

Let’s walk through 5 network issues Whisker helps you catch early, before they turn into outages or security incidents.

1. Policy Misconfigurations

Traditional observability tools often show whether a packet was forwarded, accepted or dropped, but not why. They lack visibility into which Kubernetes network policy was responsible or if one was even applied.

With Whisker, each network flow is paired with:

The enforced policy name Continue reading

Kubernetes Is Powerful, But Not Secure (at least not by default)

Kubernetes has transformed how we deploy and manage applications. It gives us the ability to spin up a virtual data center in minutes, scaling infrastructure with ease. But with great power comes great complexities, and in the case of Kubernetes, that complexity is security.

By default, Kubernetes permits all traffic between workloads in a cluster. This “allow by default” stance is convenient during development, and testing but it’s dangerous in production. It’s up to DevOps, DevSecOps, and cloud platform teams to lock things down.

To improve the security posture of a Kubernetes cluster, we can use microsegmentation, a practice that limits each workload’s network reach so it can only talk to the specific resources it needs. This is an essential security method in today’s cloud-native environments.

Why Is Microsegmentation So Hard?

We all understand that network policies can achieve microsegmentation; or in other words, it can divide our Kubernetes network model into isolated pieces. This is important since Kubernetes is usually used to provide multiple teams with their infrastructural needs or host multiple workloads for different tenants. With that, you would think network policies are first citizens of clusters. However, when we dig into implementing them, three operational challenges Continue reading

Dry Run: Your Kubernetes network policies with Calico staged network policies

Kubernetes Network Policies (KNP) are powerful resources that help secure and isolate workloads in a cluster. By defining what traffic is allowed to and from specific pods, KNPs provide the foundation for zero-trust networking and least-privilege access in cloud-native environments.

But there’s a problem: KNPs are risky, and applying them without a clear game plan can be potentially disruptive.

Without deep insight into existing traffic flows, applying a restrictive policy can instantly break connectivity killing live workloads, user sessions, or critical app dependencies. An even scarier scenario is when we implement policies that we think cover everything and workloads actually work, but after a restart or scaling operation we hit new problems. Kubernetes, with all of its features, has no built-in “dry run” mode for policies, and no first-class observability to show what would be blocked or allowed which is the right decision since Kubernetes is an orchestrator not an implementer.

This forces platform teams into a difficult choice, deploy permissive or no policies and weaken security, or Risk service disruption while debugging restrictive ones. As a result, many teams delay implementing network policies entirely only to regret it after a zero-day exploit like Log4Shell, XZ backdoor, or other vulnerabilities Continue reading

Securing Kubernetes Traffic with Calico Ingress Gateway

If you’ve managed traffic in Kubernetes, you’ve likely navigated the world of Ingress controllers. For years, Ingress has been the standard way of getting our HTTP/S services exposed. But let’s be honest, it often felt like a compromise. We wrestled with controller-specific annotations to unlock critical features, blurred the lines between infrastructure and application concerns, and sometimes wished for richer protocol support or a more standardized approach. This “pile of vendor annotations,” while functional, highlighted the limitations of a standard that struggled to keep pace with the complex demands of modern, multi-team environments and even led to security vulnerabilities.

Wait a second, is this the ‘Ingress vs. Gateway API’ debate?

Yes, and it’s a crucial one. The Kubernetes Gateway API isn’t just an Ingress v2; it’s a fundamental redesign, the “future” of Kubernetes ingress, built by the community to address these very challenges head-on.

What makes Gateway API different?

There are three main points that I came across while evaluating GatewayAPI and Ingress controllers:

Standardization & Portability: It aims to provide a core, standard way to manage ingress, reducing reliance on vendor-specific hacks and making it easier to switch implementations – change the class, and it should “just work.”
Continue reading

Is It Time to Migrate? A Practical Look at Kubernetes Ingress vs. Gateway API

If you’ve managed traffic in Kubernetes, you’ve likely navigated the world of Ingress controllers. For years, Ingress has been the standard way of getting HTTP/S services exposed. But let’s be honest, it often felt like a compromise. We wrestled with controller-specific annotations to unlock critical features, blurred the lines between infrastructure and application concerns, this complexity didn’t just make portability more difficult, it sometimes led to security vulnerabilities and other complications.

As part of Calico Open Source v3.30, we have released a free and open source Calico Ingress Gateway that implements a custom built Envoy proxy with the Kubernetes Gateway API standard to help you navigate Ingress complexities with style. This blog post is designed to get you up to speed on why such a change might be the missing link in your environment.

The Situation: The Ingress Rut

The challenge with traditional Ingress wasn’t a lack of effort, since the landscape is full of innovative solutions. However, the problem was the lack of a unified, expressive, and role-aware standard. Existing ingress controllers were capable, implemented advanced features, however at the same time tied you to a specific project/vendor.

This meant:

Vendor Lock-In: Migrating from one ingress controller Continue reading

Recap: KubeCon + CloudNativeCon Europe 2025

When I got the assignment to attend KubeCon 1st of April I thought it was an April prank, but as the date got closer I realized—this is for real and I’ll be on the ground in London at the tenth anniversary of cloud native computing. I’ve seen a lot of tech events during my years in the industry while trying not to get replaced by AI and I have to say this one stands out!

Image source: CNCF YouTube Channel

Here is my recap of KubeCon + CloudNativeCon Europe 2025.

CalicoCon 2025

CalicoCon is an event that happens twice every year, as a co-located event during KubeCon NA and EU. It’s a free event that allows you to learn about Tigera’s vision for the future of networking and security in the cloud. There’s also an after-party to celebrate our community and people like you who are on this journey with us!

This year our main focus was on Calico v3.30, our upcoming release that will add a lot of anticipated features to Calico, unlocking things like observability, staged network policy, and gateway api. CalicoCon brought together cloud-native enthusiasts to explore the latest advancements in Calico and Kubernetes networking.

How to get started with Calico Observability features

Kubernetes, by default, adopts a permissive networking model where all pods can freely communicate unless explicitly restricted using network policies. While this simplifies application deployment, it introduces significant security risks. Unrestricted network traffic allows workloads to interact with unauthorized destinations, increasing the potential for cyberattacks such as Remote Code Execution (RCE), DNS spoofing, and privilege escalation.

To better understand these problems, let’s examine a sample Kubernetes application: ANP Demo App.

This application comprises a deployment that spawns pods and a service that exposes them to external users in a similar situation like any real word workload which you will encounter in your environment.

If you open the application service before implementing any policies, the application reports the following messages:

Container can reach the Internet – Without network policies, an attacker can use our container as an entry point by exploiting it with a vulnerability. This could allow them to exfiltrate data or establish remote control over the workload by leveraging its Internet access.
Container can reach CoreDNS Pods – Kubernetes relies heavily on DNS, with records served using CoreDNS Pods. While communication between your Pods and CoreDNS is essential and not inherently a vulnerability, pairing it with unrestricted access to Continue reading

Calico Open Source 3.30: Exploring the Goldmane API for custom Kubernetes Network Observability

Kubernetes is built on the foundation of APIs and abstraction, and Calico leverages its extensibility to deliver network security and observability in both its commercial and open source versions. APIs are the special sauce that help automate and operationalize your Kubernetes platforms as part of a CI/CD pipeline and other GitOps workflows.

Calico OSS 3.30, introduces numerous battle-tested observability and security tools from our commercial editions. This includes the following key features:

Goldmane – A gRPC-based API for accessing and capturing flow logs and policy evaluation metrics
Whisker – A web-based tool for viewing and filtering flow logs to troubleshoot connectivity issues and author and maintain Calico network security policies
GlobalStagedNetworkPolicy and StagedNetworkPolicy – New custom resources that allow you to audit the behavior of a new policy before you actively enforce it
Calico Ingress Gateway – Our 100% upstream, enterprise-ready implementation of the Gateway API that is based on Envoy Gateway
Calico Cloud ready – Every OSS cluster includes the required components to connect to a stateless, read-only, and free version of Calico Cloud

You may know about the Calico REST API, which allows you to manage Calico resources, such as Calico network policy, Calico IPAM configurations Continue reading

Calico Whisker, Your New Ally in Network Observability

With the upcoming release of Calico v3.30 on the horizon, we are excited to introduce Calico Whisker, a simple yet powerful User Interface (UI) designed to enhance network observability and policy debugging. If you’ve ever struggled to make sense of network flow logs or troubleshoot policies in a complex Kubernetes cluster, Whisker is your friend!

Whisker is a three part deployment that holds a UI, backend and a gRPC channel to communicate with the Felix brain of Calico to gather live flow information and present it in a human readable, easy to understand way. But before we get started let’s dive into why Whisker is a must-have for your Kubernetes environment, what problems it solves, and how it can streamline your policy management.

Navigating Network Flows is Difficult

In Kubernetes environments, network flows are the backbone of communication between workloads. As clusters scale, so does the complexity of managing these flows and their security. Without clear visibility and effective observability tools, teams often struggle with:

Diagnosing unexplained workload behavior and determining why certain applications aren’t working as expected.
Identifying the real reason why certain workload communications are permitted or denied, which stems from understanding which policies are affecting specific Continue reading

High-Performance Kubernetes Networking with Calico eBPF

Kubernetes has revolutionized cloud-native applications, but networking remains a crucial aspect of ensuring scalability, security, and performance. Default networking approaches, such as iptables-based packet filtering, often introduce performance bottlenecks due to inefficient packet processing and complex rule evaluations. This is where Calico eBPF comes into play, offering a powerful alternative that enhances networking efficiency and security at scale.

Understanding Kubernetes Networking

Kubernetes networking consists of two primary components:

Physical Network Infrastructure – Connects cloud resources to external networks, ensuring communication between nodes and the broader internet.
Cluster Network Infrastructure – Manages internal workload communication within the Kubernetes cluster, including service-to-service traffic and pod-to-pod interactions.

Choosing the right data plane is critical for optimal performance. Factors such as cluster size, throughput, and security requirements should guide this choice. Poor networking choices can lead to congestion, excessive latency, and resource starvation.

Data Plane Options in Kubernetes Networking

Networking in Kubernetes is an abstract idea. While Kubernetes lays the foundation, your Container Networking Interface (CNI) is in charge of the actual networking. To better understand networking, we usually divide it into two sections: a control plane and a data plane.

Control Plane – Control plane is the part that manages how Continue reading

Native Kubernetes cluster mesh with Calico

workloads from remote clusters

As Kubernetes continues to gain traction in the cloud-native ecosystem, the need for robust, scalable, and highly available cluster deployments has become more noticeable.

While a Kubernetes cluster can easily expand via additional nodes, the downside of such an approach is that you might have to spend a lot of time troubleshooting the underlying networking or managing and updating resources between clusters. On top of that, a multi-regional scenario or hyper-cloud environment might be off the limits depending on the limitations that a cloud provider or your Kubernetes distro might impose on your environment.

Calico Enterprise cluster mesh is a suite of features native to Kubernetes with a multi-layer design that connects two or more Kubernetes clusters and seamlessly shares resources between them. This post will explore cluster mesh, its benefits, and how it can enhance your Kubernetes environment.

Projects that provide cluster mesh

Multiple projects offer cluster mesh, and while they are all similar in basic principles, each has a different take on implementing this solution in an environment.

The following table is a brief overview of notable projects that offer cluster mesh:

	Calico Open Source	Calico Enterprise	Cilium	Calico Enterprise	Submariner
Encapsulation	IPIP	Direct Continue reading

Kubernetes network policies: 4 pain points and how to address them

Kubernetes is used everywhere, from test environments to the most critical production foundations that we use daily, making it undoubtedly a de facto in cloud computing. While this is great news for everyone who works with, administers, and expands Kubernetes, the downside is that it makes Kubernetes a favorable target for malicious actors.

Malicious actors typically exploit flaws in the system to gain access to a portion of the environment. They then chain these flaws together to move laterally within the environment, ultimately seeking root access or access to critical information.

While the best way to fix security flaws in any software is to patch it with appropriate fixes that the project maintainers publish, there are certain security practices that you can adopt to fortify your environment, like using network policies. However, most people find network policies complex and overwhelming, which discourages them from implementing policies in their environment.

In this blog post, we will examine four pain points that people face when they want to implement network policies and provide solutions to help you effectively secure your Kubernetes environment.

What is a network policy and why should I use it?

In Kubernetes, a network policy (KNP) resource is the Continue reading

What is new in Calico 3.28

TL/DR

A new Grafana dashboard that helps you monitor Calico Typha’s performance and troubleshoot issues.
Calico eBPF dataplane IPv6 is now GA. It supports true IPv6-only clusters as well as dual-stack clusters.
Optional Pod startup delay to ensure networking is up in high-churn scenarios.
Tigera operator now supports multiple IP pools, IP pool modification, affinity for operator pods, priorityclassname, and more!
Improved policy performance in both eBPF and iptables.
Calico now ships with a pprof server. Activate the performance server for real-time views of Typha and Felix components and real-time debugging.

Important changes

Calico 3.28 now has enabled VXLAN checksum offload by default for environments with the kernel version of 5.8 or above. In the past, offloading was disabled due to kernel bugs.

Please keep in mind, if you are upgrading to 3.28 this change will take effect after node restarts.

If you encounter unexpected performance issues, you can use the following command to revert to the previous method by using the following command:

kubectl patch felixconfiguration default --type="merge" -p='{"spec":{"FeatureDetectOverride":"ChecksumOffloadBroken=true"}'

Please keep in mind that you can report any issues via GitHub tickets or Slack and include a detailed description of the environment (NIC hardware, kernel, distro, Continue reading

Amazon EKS networking options

When setting up a Kubernetes environment with Amazon Elastic Kubernetes Service (EKS), it is crucial to understand your available networking options. EKS offers a range of networking choices that allow you to build a highly available and scalable cloud environment for your workloads.

In this blog post, we will explore the networking and policy enforcement options provided by AWS for Amazon EKS. By the end, you will have a clear understanding of the different networking options and network policy enforcement engines, and other features that can help you create a functional and secure platform for your Kubernetes workloads and services.

Amazon Elastic Kubernetes Service (EKS)

Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that simplifies routine operations, such as cluster deployment and maintenance, by automating tasks such as patching and updating operating systems and their underlying components. EKS enhances scalability through AWS Auto Scaling groups and other AWS service integrations and offers a highly available control plane to manage your cluster.

Amazon EKS in the cloud has two options:

Managed
Self-managed

Managed clusters rely on the AWS control plane node, which AWS hosts and controls separately from your cluster. This node operates in isolation and cannot be directly Continue reading

Exploring AKS networking options

At Kubecon 2023 in Amsterdam, Azure made several exciting announcements and introduced a range of updates and new options to Azure-CNI (Azure Container Networking Interface). These changes will help Azure Kubernetes Services (AKS) users to solve some of the pain points that they used to face in previous iterations of Azure-CNI such as IP exhaustion and big cluster deployments with custom IP address management (IPAM). On top of that, with this announcement Microsoft officially added an additional dataplane to the Azure platform.

The big picture

Worker nodes in an AKS (Azure Kubernetes Service) cluster are Azure VMs pre-configured with a version of Kubernetes that has been tested and certified by Azure. These clusters communicate with other Azure resources and external sources (including the internet) via the Azure virtual network (VNet).

Now, let’s delve into the role of the dataplane within this context. The dataplane operations take place within each Kubernetes node. It is responsible for handling the communication between your workloads, and cluster resources. By default, an AKS cluster is configured to utilize the Azure dataplane, which Continue reading

Turbocharging host workloads with Calico eBPF and XDP

In Linux, network-based applications rely on the kernel’s networking stack to establish communication with other systems. While this process is generally efficient and has been optimized over the years, in some cases it can create unnecessary overhead that can impact the overall performance of the system for network-intensive workloads such as web servers and databases.

XDP (eXpress Data Path) is an eBPF-based high-performance datapath inside the Linux kernel that allows you to bypass the kernel’s networking stack and directly handle packets at the network driver level. XDP can achieve this by executing a custom program to handle packets as they are received by the kernel. This can greatly reduce overhead, improve overall system performance, and improve network-based applications by shortcutting the normal networking path of ordinary traffic. However, using raw XDP can be challenging due to its programming complexity and the high learning curve involved. Solutions like Calico Open Source offer an easier way to tame these technologies.

Calico Open Source is a networking and security solution that seamlessly integrates with Kubernetes and other cloud orchestration platforms. While infamous for its policy engine and security capabilities, there are many other features that can be used in an environment by installing Continue reading

What’s new in Calico v3.26

We are excited to announce the release of Calico v3.26! This latest milestone brings a range of enhancements and new features to the Calico ecosystem, delivering an optimized and secure networking solution. This release has a strong emphasis on product performance, with strengthened security measures, expanded compatibility with Windows Server 2022 and OpenStack Yoga, and notable improvements to the Calico eBPF dataplane.

As always, let’s begin by thanking our awesome community members who helped us in this release.

Community shoutout

Big thanks to our GitHub users afshin-deriv, blue-troy, and winstonu for their valuable contributions in enhancing the Kind installation and VXLAN documentation, as well as improving the code comments.

Additionally, we would like to extend our appreciation to laibe and yankay for their efforts in updating the flannel version and improving the IPtables detection mechanism. Their contributions have been instrumental in improving the overall functionality and reliability of our project.

Finally, a huge thank-you to dilyevsky, detailyang, mayurjadhavibm, and olljanat for going above and beyond in pushing Calico beyond its original scope and for generously sharing their solutions with the rest of the community.

Community-driven enhancement request: Fine-grained BGP route control

The primary responsibility Continue reading

Monitoring Kubernetes clusters activity with Azure Managed Grafana and Calico

Cloud computing revolutionized how a business can establish its digital presence. Nowadays, by leveraging cloud features such as scalability, elasticity, and convenience, businesses can deploy, grow, or test an environment in every corner of the world without worrying about building the required infrastructure.

Unlike the traditional model, which was based on notifying the service provider to set up the resources for customers in advance, in an on-demand model, cloud providers implement application programming interfaces (API) that can be used by customers to deploy resources on demand. This allows the customer to access an unlimited amount of resources on-demand and only pay for the resources they use without worrying about the infrastructure setup and deployment complexities.

For example, a load balancer service resource is usually used to expose an endpoint to your clients. Since a cloud provider’s bandwidth might be higher than what your cluster of choice can handle, a huge spike or unplanned growth might cause some issues for your cluster and render your services unresponsive.

To solve this issue, you can utilize the power of proactive monitoring and metrics to find usage patterns and get insight into your system’s overall health and performance.

In this hands-on tutorial, I will Continue reading

Calico’s 3.26.0 update unlocks high density vertical scaling in Kubernetes

Kubernetes is a highly popular and widely used container orchestration platform designed to deploy and manage containerized applications at a scale, with strong horizontal scaling capabilities that can support up to 5,000 nodes; the only limit in adding nodes to your cluster is your budget. However, its vertical scaling is restricted by its default configurations, with a cap of 110 pods per node. To maximize the use of hardware resources and minimize the need for costly horizontal scaling, users can adjust the kubelet maximum pod configuration to increase this limit allowing more pods to run concurrently on a single node.

To avoid network performance issues and achieve efficient horizontal scaling in a Kubernetes cluster that is tasked to run a large number of pods, high-speed links and switches are essential. A reliable and flexible Software Defined Networking (SDN) solution, such as Calico, is also important for managing network traffic efficiently. Calico has been tested and proven by numerous companies for horizontal scaling, but in this post, we will discuss recent improvements made to help vertical scaling of containerized applications to just work.

For example, the following chart illustrates the efficiency achieved with the improvements of vertical scaling in Calico 3. Continue reading

1 2 Next »