Istio 1.5 Brings Advanced Automation for Secure Performance

Istio has emerged as one of the most frequently utilized service mesh technologies for securing and controlling network traffic within containers and Kubernetes. Its powerful feature set makes it instrumental in solving a number of real issues users regularly encounter when running microservices. Following the standard three-month period since the release of Istio 1.4, Istio 1.5 introduces an impressive number of improvements that increase automation and provide tooling to help further operationalize the platform. With major architectural changes and several API updates under the hood, Istio 1.5 provides new capabilities that improve the user experience and functionality of the platform. The following highlights will help organizations optimize Istio for configuration management, architecture support, and overall performance. Configuration Management Karen Bruner Karen Bruner is a Principal DevOps Engineer for StackRox, where she drives automation and advocates for operationalizing the product. Previously, Karen has held DevOps and site reliability engineering roles at Clari, Ooyala, LinkedIn, and Yahoo. She started her career working in Hollywood in the digital effects industry and has a film credit in “Babe” for Internet Bandit. She spends her spare time rendering puns in yarn, learning obscure fiber crafts, and tripping over cats. Istioctl Istio 1. Continue reading

Building BGP Route Reflector Configuration with Ansible/Jinja2

One of our subscribers sent me this email when trying to use ideas from Ansible for Networking Engineers webinar to build BGP route reflector configuration:

I’m currently discovering Ansible/Jinja2 and trying to create BGP route reflector configuration from Jinja2 template using Ansible playbook. As part of group_vars YAML file, I wish to list all route reflector clients IP address. When I have 50+ neighbors, the YAML file gets quite unreadable and it’s hard to see data model anymore.

Whenever you hit a roadblock like this one, you should start with the bigger picture and maybe redefine the problem.

Why use Typha in your Calico Kubernetes Deployments?

Calico is an open source networking and network security solution for containers, virtual machines, and native host-based workloads. Calico supports a broad range of platforms including Kubernetes, OpenShift, Docker EE, OpenStack, and bare metal. In this blog, we will focus on Kubernetes pod networking and network security using Calico.

Calico uses etcd as the back-end datastore. When you run Calico on Kubernetes, you can use the same etcd datastore through the Kubernetes API server. This is called a Kubernetes backed datastore (KDD) in Calico. The following diagram shows a block-level architecture of Calico.

Calico-node runs as a Daemonset, and has a fair amount of interaction with the Kubernetes API server. It’s easy for you to profile that by simply enabling audit logs for calico-node. For example, in my kubeadm cluster, I used the following audit configuration

 

To set the context, this is my cluster configuration.
As we are running Typha already, let us profile the API calls for both Calico and Typha components. I used the following commands to extract the unique API calls for each.

 

If you ignore the license key API calls from calico-node, you will see that the API calls Continue reading

Juniper QFX10K IPFIX Gotchas

IPFIX is problematic on the Juniper QFX10K switches. Documentation is sparse, and doesn’t have a complete configuration. Behavior changes between versions in undocumented ways. Here’s a couple of things I noticed when upgrading from Junos 17.3 to 17.4. These also apply if you are running 18.4 code. I hit more problems with 18.4, and ended up rolling back to 17.4.

Big Changes in Reported Throughput

Here’s a graph showing total reported throughput for a QFX10K I upgraded:

ipfix traffic report

There’s a few things going on there. First the reported traffic drops to zero after I upgraded. Then it starts coming up, after I fixed the first problem. But then after that the reported traffic is flat, and lower than it should be. Then it starts coming up again after I made the second fix.

First Problem: Chassis Sample Instance

The first configuration change I needed to add was this: set chassis fpc 0 sampling-instance sample-border, where sample-border is the name of the sampling instance I have configured under forwarding-options. This was not required with 17.3. If you don’t do it with 17.4, you won’t get any data.

Second Problem: DDoS-Protection

Some Juniper platforms implement Continue reading

Juniper QFX10K IPFIX Gotchas

IPFIX is problematic on the Juniper QFX10K switches. Documentation is sparse, and doesn’t have a complete configuration. Behavior changes between versions in undocumented ways. Here’s a couple of things I noticed when upgrading from Junos 17.3 to 17.4. These also apply if you are running 18.4 code. I hit more problems with 18.4, and ended up rolling back to 17.4.

Big Changes in Reported Throughput

Here’s a graph showing total reported throughput for a QFX10K I upgraded:

ipfix traffic report

There’s a few things going on there. First the reported traffic drops to zero after I upgraded. Then it starts coming up, after I fixed the first problem. But then after that the reported traffic is flat, and lower than it should be. Then it starts coming up again after I made the second fix.

First Problem: Chassis Sample Instance

The first configuration change I needed to add was this: set chassis fpc 0 sampling-instance sample-border, where sample-border is the name of the sampling instance I have configured under forwarding-options. This was not required with 17.3. If you don’t do it with 17.4, you won’t get any data.

Second Problem: DDoS-Protection

Some Juniper platforms implement Continue reading

April Customer Newsletter

Welcome to the April 2020 edition of the Tigera Calicommunication newsletter! In the March edition, we discussed context-aware flow logs. This edition covers the next component of logging, the audit logs.

Using Calico Enterprise Audit Logs to Improve Visibility, Security, and Compliance

Watch this short video to see how you can benefit from using Calico Enterprise Audit Logs.

What problems are we solving?

Kubernetes is an API-driven platform. Every action happens through an API call into the kube API server. Consequently, recording and monitoring API activity is very important. While most deployments end up sending these logs to a remote destination for compliance purposes, these logs are often not easily accessible when needed. Moreover, different roles (platform, network, security) have different requirements, and many may not even have access to the logs. Some use cases relevant to log analysis are as follows.

  • A policy change resulted in a sudden outage of a service. How do you find out which policies have changed in the last 24 hours? [network, security]
  • You are maintaining a critical namespace and want to monitor every pod that comes up in that namespace. Can you get an alert if a pod is created in that Continue reading

Lenovo intros an edge platform that runs Azure stack

Lenovo is boosting its ties to Microsoft with an edge-to-cloud platform that runs Microsoft’s Azure Stack in a hyperconverged infrastructure (HCI), putting HCI on the edge of the network rather than in a data center.The Lenovo ThinkAgile MX1021 server analyzes data at the edge near where it is gathered, a change in direction for the usual edge strategy. In earlier edge schemes,  data collected at an edge endpoint is merely sorted, and only the relevant data is sent up to the main data center where it is analyzed.[Get regularly scheduled insights by signing up for Network World newsletters.] The ThinkAgile MX1021 platform is a ruggedized, half-width, short-depth, 1U compact server that can be installed almost anywhere: hung on a wall, stacked on a shelf, or mounted in a rack. For connectivity, it supports Wi-Fi, 4G and 5G.To read this article in full, please click here

Daily Roundup: Is AT&T Readying More Job Cuts?

The carrier is “sizing our operations to economic activity”; VMware is helping Vodafone cut...

Read More »

© SDxCentral, LLC. Use of this feed is limited to personal, non-commercial use and is governed by SDxCentral's Terms of Use (https://www.sdxcentral.com/legal/terms-of-service/). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.

How to Protect Your Virtual Meetings from Zoombombing

Imagine, if you will, you’re participating in a Eric Yuan has put a freeze on feature updates, in order to address the security issues. Zoom’s promise was to address the problem within the next 90 days, when Yuan said, “Over the next 90 days, we are committed to dedicating the resources needed to better identify, address, and fix issues proactively. We are also committed to being transparent throughout this process. We want to do what it takes to maintain your trust.” Another writer for The New Stack, Jennifer Riggins Continue reading

Cumulus content roundup: March 2020

Spring has sprung! With the change of the seasons, we’ve kept busy pumping out new content and useful resources. If you’re looking for a quick mental vacation, get ready to cozy up with this month’s edition of the Cumulus Content Roundup. We’ve got exciting announcements, fresh podcast episodes for your listening enjoyment, as well as blog posts packed with open networking and data center goodness.

From Cumulus Networks
Cumulus Networks launches the industry’s first open source and fully packaged automation solution — making open networking easier to deploy and manage and enabling infrastructure-as-code models: Cumulus Networks announces the release of its production-ready automation solution for organizations moving towards fully automated networks! Read all about how we are taking the next step in network automation in this blog post.

Production-ready automation — the how and why: So we’ve announced the industry’s first open source and fully packaged automation solution– but how exactly did we get there? This blog post dives into the challenges that customers were facing and the reason we wanted to help.

A new era for Cumulus in the Cloud: Can you believe that Cumulus in the Cloud was launched over two years ago? Yeah, we’re also Continue reading

AT&T Hints at COVID-19 Related Job Cuts

It expects "sizing our operations to economic activity" along with its ongoing business execution...

Read More »

© SDxCentral, LLC. Use of this feed is limited to personal, non-commercial use and is governed by SDxCentral's Terms of Use (https://www.sdxcentral.com/legal/terms-of-service/). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.

Versa Targets SMBs, Pens SD-WAN Deal With Nuvias

Versa Titan promises to simplify the deployment and management of branch offices and make it easier...

Read More »

© SDxCentral, LLC. Use of this feed is limited to personal, non-commercial use and is governed by SDxCentral's Terms of Use (https://www.sdxcentral.com/legal/terms-of-service/). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.

Vodafone Cut Costs 50% With VMware Telco Cloud

VMware’s network virtualization infrastructure supports voice core, data core, and service...

Read More »

© SDxCentral, LLC. Use of this feed is limited to personal, non-commercial use and is governed by SDxCentral's Terms of Use (https://www.sdxcentral.com/legal/terms-of-service/). Publishing this feed for public or commercial use and/or misrepresentation by a third party is prohibited.

Announcing the Compose Specification

Docker is pleased to announce that we have created a new open community to develop the Compose Specification. This new community will be run with open governance with input from all interested parties allowing us together to create a new standard for defining multi-container apps that can be run from the desktop to the cloud. 

Docker is working with Amazon Web Services (AWS), Microsoft and others in the open source community to extend the Compose Specification to more flexibly support cloud-native platforms like Kubernetes and Amazon Elastic Container Service (Amazon ECS) in addition to the existing Compose platforms. Opening the specification will allow innovation to flourish and deliver more choices to developers, accelerating how development teams build and ship applications.

Currently used by millions of developers and with over 650,000 Compose files on GitHub, Compose has been widely embraced by developers because it is a simple cloud and platform-agnostic way of defining multi-container based applications. Compose dramatically simplifies the code to cloud process and toolchain for developers by allowing them to define a complex stack in a single file and run it with a single command. This eliminates the need to build and start every container manually, saving development Continue reading

Changing Conditions for Neural Network Processing

Over the last few years the idea of “conditional computation” has been key to making neural network processing more efficient, even though much of the hardware ecosystem has focused on general purpose approaches that rely on matrix math operations that brute force the problem instead of selectively operate on only the required pieces.

Changing Conditions for Neural Network Processing was written by Nicole Hemsoth at The Next Platform.

Project Crossbow: Lessons from Refactoring a Large-Scale Internal Tool

Project Crossbow: Lessons from Refactoring a Large-Scale Internal Tool
Project Crossbow: Lessons from Refactoring a Large-Scale Internal Tool

Cloudflare’s global network currently spans 200 cities in more than 90 countries. Engineers working in product, technical support and operations often need to be able to debug network issues from particular locations or individual servers.

Crossbow is the internal tool for doing just this; allowing Cloudflare’s Technical Support Engineers to perform diagnostic activities from running commands (like traceroutes, cURL requests and DNS queries) to debugging product features and performance using bespoke tools.

In September last year, an Engineering Manager at Cloudflare asked to transition Crossbow from a Product Engineering team to the Support Operations team. The tool had been a secondary focus and had been transitioned through multiple engineering teams without developing subject matter knowledge.

The Support Operations team at Cloudflare is closely aligned with Cloudflare’s Technical Support Engineers; developing diagnostic tooling and Natural Language Processing technology to drive efficiency. Based on this alignment, it was decided that Support Operations was the best team to own this tool.

Learning from Sisyphus

Whilst seeking advice on the transition process, an SRE Engineering Manager in Cloudflare suggested reading: “A Case Study in Community-Driven Software Adoption”. This book proved a truly invaluable read for anyone thinking of doing internal tool development Continue reading

When We Come Together, We Are Richer for It

These are unsettling and unprecedented times.

The speed at which coronavirus has taken hold around the world, and the dramatic changes to our lives that it has brought, would have been difficult for many of us to contemplate just a few short weeks ago.

Social (and physical) distancing measures that were merely a suggestion then have suddenly become a strange reality for millions of people, with entire countries going into complete lockdown, borders and schools closing, planes no longer flying, and normal social activity placed on hold.

The vital role the Internet is playing is clear for all to see. It allows us to work together while we are socially apart, quietly and quickly providing many of us with a way to continue our lives. It has allowed us to maintain at least some sense of the ordinary during an extraordinary time.

We are asking a lot of the Internet, but it is ready for the challenge. It is enabling companies to keep working, schoolchildren to continue learning, and families and friends to stay connected. Even virtual birthday parties and weddings have become a hit!

The Internet means that self-isolation may be a physical reality, but it need not be Continue reading

Can We Trust BGP Next Hops (Part 1)?

Aldrin sent me an interesting question as a comment to one of my EVPN blog posts:

How does the network know that a VTEP is actually alive? (1) from the point of view of the control plane and (2) from the point of view of the data plane? And how do you ensure that control and data plane liveness monitoring has the same view? BFD for BGP is a possible solution for (1) but it’s not meant for 3rd party next hops, i.e. it doesn’t address (2).

Let’s stop right there (or you’ll stop reading in the next 10 milliseconds). I will also try to rephrase the question in more generic terms, hoping Aldrin won’t mind a slight detour… we’ll get back to the original question in another blog post.