Tanzu Service Mesh Acceleration using eBPF

Performance and Security Optimizations on Intel Xeon Scalable Processors – Part 1

Contributors

Manish Chugtu — VMware

Ramesh Masavarapu, Saidulu Aldas, Sakari Poussa, Tarun Viswanathan  — Intel

Introduction

VMware Tanzu Service Mesh built on open source Istio, provides advanced, end-to-end connectivity, security, and insights for modern applications—across application end-users, microservices, APIs, and data—enabling compliance with Service Level Objectives (SLOs) and data protection and privacy regulations.

Service Mesh architecture pattern solves many problems, which are well known and extensively documented – so we won’t be talking about those in this blog. But it also comes with its own challenges and some of the top focus areas that we will discuss in this series of blogs are around:

  1. Performance
  2. Security

Intel and VMware have been working together to optimize and accelerate the microservices middleware and infrastructure with software and hardware to ensure developers have the best-in-class performance and low latency experience when building distributed workloads with a focus on improving the performance, crypto accelerations, and making it more secure.

In Part 1 of this blog series, we will talk about one such performance challenge (with respect to service mesh data path performance) and discuss our solution around that.

The current implementation Continue reading

Quick and easy vulnerability management with Calico Cloud

As more enterprises adopt containers, microservices, and Kubernetes for their cloud-native applications, they need to be aware of the vulnerabilities in container images during build and runtime that can be exploited. In this blog, I will demonstrate how you can implement vulnerability management in CI/CD pipelines, perform image assurance during build time, and enforce runtime threat defense to protect your workloads from security threats.

Image scanning and automatic blocking of high-risk images

The majority of images in CI/CD pipelines have vulnerabilities, misconfigurations, or both. An active cloud-native application protection platform (CNAPP) should scan, identify, and list vulnerabilities in container images based on databases such as NIST and NVD. The active CNAPP should then help teams build security policies to determine which images should be deployed or blocked based on several factors such as severity, last scan timestamp, and organizational exceptions. Given the sheer amount of vulnerabilities that appear daily, users will be easily overwhelmed if they have to address all existing vulnerabilities. Security teams will have to build a deploy/block criteria to prioritize vulnerabilities that they will address first—a workflow that is easy to start but difficult to manage and operate long-term. Hence, security teams should look for a security Continue reading

The mechanics of a sophisticated phishing scam and how we stopped it

The mechanics of a sophisticated phishing scam and how we stopped it
The mechanics of a sophisticated phishing scam and how we stopped it

Yesterday, August 8, 2022, Twilio shared that they’d been compromised by a targeted phishing attack. Around the same time as Twilio was attacked, we saw an attack with very similar characteristics also targeting Cloudflare’s employees. While individual employees did fall for the phishing messages, we were able to thwart the attack through our own use of Cloudflare One products, and physical security keys issued to every employee that are required to access all our applications.

We have confirmed that no Cloudflare systems were compromised. Our Cloudforce One threat intelligence team was able to perform additional analysis to further dissect the mechanism of the attack and gather critical evidence to assist in tracking down the attacker.

This was a sophisticated attack targeting employees and systems in such a way that we believe most organizations would be likely to be breached. Given that the attacker is targeting multiple organizations, we wanted to share here a rundown of exactly what we saw in order to help other companies recognize and mitigate this attack.

Targeted Text Messages

On July 20, 2022, the Cloudflare Security team received reports of employees receiving legitimate-looking text messages pointing to what appeared to be a Cloudflare Okta login Continue reading

DDoS detection with advanced real-time flow analytics

The diagram shows two high bandwidth flows of traffic to the Customer Network, the first (shown in blue) is a bulk transfer of data to a big data application, and the second (shown in red) is a distributed denial of service (DDoS) attack in which large numbers of compromised hosts attempt to flood the link connecting the Customer Network to the upstream Transit Provider. Industry standard sFlow telemetry from the customer router streams to an instance of the sFlow-RT real-time analytics engine which is programmed to detect (and mitigate) the DDoS attack.

This article builds on the Docker testbed to demonstrate how advanced flow analytics can be used to separate the two types of traffic and detect the DDoS attack.

docker run --rm -d -e "COLLECTOR=host.docker.internal" -e "SAMPLING=100" \
--net=host -v /var/run/docker.sock:/var/run/docker.sock:ro \
--name=host-sflow sflow/host-sflow
First, start a Host sFlow agent using the pre-built sflow/host-sflow image to generate the sFlow telemetry that would stream from the switches and routers in a production deployment. 
setFlow('ddos_amplification', {
keys:'ipdestination,udpsourceport',
value: 'frames',
values: ['count:ipsource']
});
setThreshold('ddos_amplification', {
metric:'ddos_amplification',
value: 10000,
byFlow:true,
timeout: 2
});
setEventHandler(function(event) {
var [ipdestination,udpsourceport] = event.flowKey.split(',');
var [sourcecount] = event.values;
Continue reading

Revisiting X.509 Certificates in Kubeconfig Files

In 2018, I wrote an article on examining X.509 certificates embedded in Kubeconfig files. In that article, I showed one way of extracting client certificate data from a Kubeconfig file and looking at the properties of the client certificate data. While there’s nothing technically wrong with that article, since then I’ve found another tool that makes the process a tad easier. In this post, I’ll revisit the topic of examining embedded X.509v3 certificates in Kubeconfig files.

The tool that I’ve found is yq, which is an incredibly useful tool when it comes to parsing YAML (much in the same way that jq is an incredibly useful tool when it comes to parsing JSON). I should probably write some sort of introductory post on yq.

In any case, you can use yq to replace the grep plus awk combo outlined in my earlier article on examining certificate data in Kubeconfig files. Instead, to pull out only the client certificate data, just use this yq command (you did know that Kubeconfig files are YAML, right?):

yq '.users[0].user.client-certificate-data' < ~./kube/config

(Of course, this command assumes your Kubeconfig file is named config in the ~/.kube Continue reading

Introducing new Cloudflare for SaaS documentation

Introducing new Cloudflare for SaaS documentation
Introducing new Cloudflare for SaaS documentation

As a SaaS provider, you’re juggling many challenges while building your application, whether it’s custom domain support, protection from attacks, or maintaining an origin server. In 2021, we were proud to announce Cloudflare for SaaS for Everyone, which allows anyone to use Cloudflare to cover those challenges, so they can focus on other aspects of their business. This product has a variety of potential implementations; now, we are excited to announce a new section in our Developer Docs specifically devoted to Cloudflare for SaaS documentation to allow you take full advantage of its product suite.

Cloudflare for SaaS solution

You may remember, from our October 2021 blog post, all the ways that Cloudflare provides solutions for SaaS providers:

  • Set up an origin server
  • Encrypt your customers’ traffic
  • Keep your customers online
  • Boost the performance of global customers
  • Support custom domains
  • Protect against attacks and bots
  • Scale for growth
  • Provide insights and analytics
Introducing new Cloudflare for SaaS documentation

However, we received feedback from customers indicating confusion around actually using the capabilities of Cloudflare for SaaS because there are so many features! With the existing documentation, it wasn’t 100% clear how to enhance security and performance, or how to support custom domains. Now, we want Continue reading

How I Migrated from MediaWiki to Notion

I've written before about how I use MediaWiki for taking notes and as one of my study tools. This has worked well for many years. But a problem started to develop: while I wrote my technical notes in MediaWiki, I wrote my day-to-day notes (books I want to read, notes from podcasts I listen to, and even my weekly planner) in Notion. This meant I had to use different apps for reading/writing in each tool, remember two different markup languages, and couldn't (cleanly) link pieces of content between the two. The final straw was realizing how much more effort I had to expend to maintain my MediaWiki instance; I just didn't have the time or will to keep up with new releases not to mention maintain the server itself.

For these reasons, I decided to move all of my MediaWiki content to Notion and unify all of my notes. But this revealed a new problem: there was no tooling to automate this. So I created my own. Here's how it works.

Read the rest of this post.

Report: Cloud services can be made more resilient but at a premium

Enterprises have options for making cloud services more resilient but at a price premium of up to 111% over the base price of services that offer no protection, according to a study by Uptime Institute.The extra cost can mean faster recovery times, better compensation from service-level agreements when there are outages, and improved “implied reliability,” according to Uptime’s report “Public cloud costs versus resiliency: stateless applications”. [ Get regularly scheduled insights by signing up for Network World newsletters. ] The institute modeled three scenarios for improving the resiliency of a simple WordPress website that was, at peak, required to deliver webpages within three seconds of requests. The researchers generated a Python simulation that varied bandwidth and virtual-machine demands to analyze their effects on costs.To read this article in full, please click here

Tech Bytes: Aruba Networks AIOps Get More Features and Functions

Aruba Networks is announcing new capabilities in its Aruba Central platform that leverage machine learning to do things like provide insights into clients on the network, recommend firmware for the best AP performance, and enable natural language queries in languages other than English.

The post Tech Bytes: Aruba Networks AIOps Get More Features and Functions appeared first on Packet Pushers.

RFC9199: Lessons in Large-scale Service Deployment

While RFC9199 (are we really in the 9000’s?) is targeted at large-scale DNS deployments–specifically root zone operators–so it might seem the average operator won’t find a lot of value here.

This is, however, far from the truth. Every lesson we’ve learned in deploying large-scale DNS root servers applies to any other large-scale user-facing service. Internally deployed DNS recursive servers are an obvious instance, but the lessons here might well apply to a scheduling, banking, or any other multi-user application accessed from a lot of places by a lot of different users. There are some unique points in DNS, such as the relatively slower pace of database synchronization across nodes, but the network-side lessons can still be useful for a lot of applications.

What are those lessons?

First, using anycast dramatically improves performance for these kinds of services. For those who aren’t familiar with the concept, anycase turns an IP address into a service identifier. Any host with a copy (or instance) or a given service advertises the same address, causing the routing table to choose the (topologically) closest instance of the service. If you’re using anycast, traffic destined to your service will automatically be forwarded to the closest server Continue reading

Posts from the Past, August 2022

I thought I might start highlighting some older posts here on the site through a semi-regular “Posts from the Past” series. I’ll start with posts published in the month of August through the years. Here’s hoping you find something that is useful (or perhaps entertaining, at least)!

August 2021

Last year, I had a couple of posts that I think are still relevant today. First, I talked about using Pulumi with Go to create a VPC Peering relationship on AWS. Second, I showed readers how to have Wireguard interfaces start automatically (using launchd) on macOS.

August 2020

I didn’t write too much in August 2020; my wife and I took a big road trip around the US to visit family and such. However, I did publish a post on some behavior changes in version 0.5.5 of the Cluster API tool clusterawsadm.

August 2019

This was a busy month for the blog! In addition to two Technology Short Takes, I also published posts on converting Kubernetes to an HA control plane, reconstructing the kubeadm join command (in the event you didn’t write down the output of kubeadm init), and one introducing Cluster API.

August 2018

In Continue reading

pygnmi 15. Overview of nornir_pygnmi

Hello my friend,

Are you looking for building network automation at scale leveraging the future-proof model-driven network automation? Besides attending our zero-to-hero network automation training and network automation with nornir, we suggest you to take a look at nornir_pygnmi, the new plugin we have created for Nornir to simplify management of network devices with gNMI.


1
2
3
4
5
No part of this blogpost could be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical or photocopying, recording,
or otherwise, for commercial purposes without the
prior permission of the author.

Is GNMI a Good Interface for Network Automation?

Yes, it is. GNMI is one of the most recent interfaces created for the management plane, which allows you to manage the network devices (i.e., retrieve configuration and operational data, modify configuration) and collect the streaming or event-driven telemetry. Sounds like one-size-fits-all, isn’t it? On top of that, GNMI supports also different transport channels (i.e., encrypted and non-encrypted), which makes it suitable both for lab testing and for production environment. You may feel that we are biased to gNMI, and you are right. Actually, that is a Continue reading