NetworkingNexus.net

Adopting OpenTelemetry for our logging pipeline

Cloudflare’s logging pipeline is one of the largest data pipelines that Cloudflare has, serving millions of log events per second globally, from every server we run. Recently, we undertook a project to migrate the underlying systems of our logging pipeline from syslog-ng to OpenTelemetry Collector and in this post we want to share how we managed to swap out such a significant piece of our infrastructure, why we did it, what went well, what went wrong, and how we plan to improve the pipeline even more going forward.

Background

A full breakdown of our existing infrastructure can be found in our previous post An overview of Cloudflare's logging pipeline, but to quickly summarize here:

We run a syslog-ng daemon on every server, reading from the local systemd-journald journal, and a set of named pipes.
We forward those logs to a set of centralized “log-x receivers”, in one of our core data centers.
We have a dead letter queue destination in another core data center, which receives messages that could not be sent to the primary receiver, and which get mirrored across to the primary receivers when possible.

The goal of this project was to replace those syslog-ng instances as Continue reading

Public Videos: Network Automation 101

You can access the Network Automation 101 videos without registration. Hope you’ll find them useful if you’re just starting your network automation journey.

Explore

AMD Previews “Turin” Epyc CPUs, Expands Instinct GPU Roadmap

Computex, the annual conference in Taiwan to showcase the island nation’s vast technology business, has been transformed into what amounts to a half-time show for the datacenter IT year. …

AMD Previews “Turin” Epyc CPUs, Expands Instinct GPU Roadmap was written by Timothy Prickett Morgan at The Next Platform.

Nvidia Unfolds GPU, Interconnect Roadmaps Out To 2027

There are many things that are unique about Nvidia at this point in the history of computing, networking, and graphics. …

Nvidia Unfolds GPU, Interconnect Roadmaps Out To 2027 was written by Timothy Prickett Morgan at The Next Platform.

Configuring AAA on Arista EOS Devices Using TACACS+ and ISE

In this blog post, let's look at how to configure TACACS+ AAA authentication on Arista EOS devices using Cisco ISE. When someone tries to log in to the device, we want the Arista device to authenticate and authorize the user against Cisco ISE. We'll go through the necessary configurations and steps to set up this integration between Arista EOS and Cisco ISE.

Local Authorization vs ISE Authorization

You could configure this in two ways with a slight difference. With the first method, ISE authenticates the user and tells Arista which role to apply. Arista devices come with two predefined roles, network-admin and network-operator. For example, if we have two different groups of users, network engineers who need full access and NOC engineers who only need read-only access. When the users log in, depending on the policy, ISE will send TACACS+ attributes that tell the switch which role to apply. With this method, the authorization happens locally at the switch.

For the second method, we will not use these two predefined roles. Each command the user enters on the CLI will be authorized by Cisco ISE. For example, we can allow all commands for network engineers and prevent NOC engineers from Continue reading

NetBox in the Cloud, for Free

Yes, you read that right. NetBox Labs is now offering a generous free plan for their SaaS version of NetBox. This change is a big win for many of us who no longer need to worry about managing our own NetBox instances. With this free plan, you can take advantage of all the powerful features of NetBox without the hassle of maintenance and updates.

Why This Matters to Me?

As a blogger, I create a lot of labs and practice a lot of automation. I rely on NetBox for IP Address Management (IPAM) and other network-related tasks. Before this, I had my NetBox running as a Docker container on one of my VMs. However, there were times when I wanted to access NetBox and found out the VM was powered off. This free plan is music to my ears. There is a 100-device limit, but that's more than enough for my needs.

How to Get it?

Getting started with the free plan is as simple as going to their website and signing up for a free plan. I was up and running within a few minutes. The free plan includes up to 100 devices, 500 IP addresses, and 10k API Continue reading

Fat Pipe Transition Notification – June 2024

Hey, everyone. Ethan here with a behind-the-scenes administrative request. Several thousand of you subscribe to the Packet Pushers’ Fat Pipe. In the Fat Pipe, we’ve been stuffing every single podcast we produce. The problem is that we produce way too many shows–one almost every weekday–for the average podcast client to absorb them all. We can... Read more »

Pi-hole on Raspberry Pi 3B with Debian

This guide helps you to install Pi-hole on your Raspberry Pi 3B running Debian 12, […]

The post Pi-hole on Raspberry Pi 3B with Debian first appeared on Brezular's Blog.

How Much Can Dell Profit From The AI Wave?

For most of the generative AI revolution thus far, the big original equipment manufacturers, or OEMs, have been sidelined as Nvidia and now AMD have done direct allocations of their GPU compute engines to hyperscalers, cloud builders, and other lighthouse customers. …

How Much Can Dell Profit From The AI Wave? was written by Timothy Prickett Morgan at The Next Platform.

Network observability in Kubernetes clusters for better security and faster troubleshooting

For DevOps and platform teams working with containers and Kubernetes, reducing downtime and improving security posture is crucial. A clear understanding of network topology, service interactions, and workload dependencies is required in cloud-native applications. This is essential for securing and optimizing the Kubernetes deployment and minimizing response time in the event of failure.

Network observability can highlight gaps in network policies for applications that require network policy controls to reduce the risk of attack from unsecured egress access or lateral movement of threats within the Kubernetes cluster. However, visualizing workload communication, service dependencies, and active and inactive network security policies presents significant challenges due to the distributed and dynamic nature of Kubernetes workloads.

Why is network observability difficult with Kubernetes workloads?

Kubernetes scales up and scales out pods and creates and destroys services depending on real-time business requirements, resulting in dynamic network connections for each workload instance. Network access policies defined for each workload further impact these connections.

In such a scenario, capturing an accurate and up-to-date representation of network traffic, service dependencies, and network policies is difficult. The default Kubernetes implementation provides limited network traffic visibility and policy information, making it challenging for teams to troubleshoot connectivity issues, improve Continue reading

Weekend Reads 053124

In this episode of the RIPE Labs podcast, three Internet pioneers talk about how they helped grow the Internet out of its early infancy, back when its purpose – and much of the excitement around its development – lay in the promise of connecting researchers from around the world.

Meta, parent company of Facebook and Instagram, also now is in the AI-focused processor game. The company recently unveiled the next generation of custom-made chips to help power AI-driven rankings and recommendation ads on social media platforms.

Phishing threats have reached unprecedented levels of sophistication in the past year, driven by the proliferation of generative AI tools.

In recent news, more than 13,000 subdomains of brands were hijacked for a large spam campaign that “leverages the trust associated with these domains to circulate spam and malicious phishing emails by the millions each day, cunningly using their credibility and stolen resources to slip past security measures.”

Tenable Research has discovered a critical memory corruption vulnerability dubbed Linguistic Lumberjack in Fluent Bit, a core component in the monitoring infrastructure of many cloud services.

Pew Research Center conducted the analysis to examine how often online content that once existed becomes inaccessible. One part Continue reading

HN736: Understanding VMware by Broadcom (Sponsored)

Today on Heavy Networking, sponsored by Broadcom, we talk about VMware’s transition under Broadcom’s ownership. The acquisition has led to big changes that rolled out very quickly, including how VMware sells products and services – subscription only licensing, bundles of products, a hard stop on sales of existing licenses, overhaul of license issuance, and more.... Read more »

Hedge 228: Five Execs Misunderstandings about Tech

Miscommunication between techies and business leaders are often caused by misunderstanding. Listen in as Eyvonne, Tom, and Russ discuss these misunderstandings and how we can address them.

download

Extending local traffic management load balancing to Layer 4 with Spectrum

In 2023, Cloudflare introduced a new load balancing solution, supporting Local Traffic Management (LTM). This gives organizations a way to balance HTTP(S) traffic between private or internal servers within a region-specific data center. Today, we are thrilled to be able to extend those same LTM capabilities to non-HTTP(S) traffic. This new feature is enabled by the integration of Cloudflare Spectrum, Cloudflare Tunnels, and Cloudflare load balancers and is available to enterprise customers. Our customers can now use Cloudflare load balancers for all TCP and UDP traffic destined for private IP addresses, eliminating the need for expensive on-premise load balancers.

A quick primer

In this blog post, we will be referring to load balancers at either layer 4 or layer 7. This is, of course, referring to layers of the OSI model but more specifically, the ingress path that is being used to reach the load balancer. Layer 7, also known as the Application Layer, is where the HTTP(S) protocol exists. Cloudflare is well known for our layer 7 capabilities, which are built around speeding up and protecting websites which run over HTTP(S). When we refer to layer 7 load balancers, we are referring to HTTP(S)-based services. Our layer Continue reading

Technology Short Take 178

Welcome to Technology Short Take #178! This one is notably shorter than many of the Technology Short Takes I publish; I’m still trying to fine-tune my collection of RSS feeds (such a useful technology that seems to have fallen out of favor), removing inactive feeds and looking for new feeds to replace them. Regardless, I have managed to collect a few links for your reading pleasure this weekend. Enjoy!

Networking

Marc Brooker discusses TCP_NODELAY in modern systems.
Because…why not?

Security

Matt Moore, CTO of Chainguard, goes into some detail on how Chainguard intends to honor the principles behind the CISA’s Secure by Design pledge.
Ars Technica examines TunnelVision, a vulnerability that has existed since 2002 and has the potential to render VPN apps useless. From my reading of the article, the greatest concern lies with untrusted networks where an attacker could manipulate things in their favor. Join that Wi-Fi network at the coffee shop at your own risk!
Here’s a slightly older post (March 2023) on using AppArmor to restrict app permissions, with a particular focus on containers (including Kubernetes). It’s a bit basic, but it does (in my opinion) provide some useful information.
Nick Frichette shares some Continue reading

Worth Reading: ChatGPT Does Not Summarize

I mostly gave up on LLMs being any help (apart from generating copious amounts of bullshit), but I still thought that generating summaries might be an interesting use case. I was wrong.

As Gerben Wierda explains in his recent “When ChatGPT summarises, it actually does nothing of the kind” blog post, you have to understand a text if you want to generate a useful summary, and that’s not what LLMs do. They can generate a shorter version of the text, which might not focus on the significant bits.

Worth Reading: ChatGPT Does Not Summarize

I mostly gave up on LLMs being any help (apart from generating copious amounts of bullshit), but I still thought that generating summaries might be an interesting use case. I was wrong.

Endpoint Selectors and Kubernetes Namespaces in CiliumNetworkPolicies

While performing some testing with CiliumNetworkPolicies, I came across a behavior that was unintuitive and unexpected to me. The behavior centers around how an endpoint selector behaves in a CiliumNetworkPolicies when Kubernetes namespaces are involved. (If you didn’t understand a bit of what I just said, I’ll provide some additional explanation shortly—stay with me!) After chatting through the behavior with a few folks, I realized the behavior is essentially “correct” and expected. However, if I was confused by the behavior then there’s a good chance others might be confused by the behavior as well, so I thought a quick blog post might be a good idea. Keep reading to get more details on the interaction between endpoint selectors and Kubernetes namespaces in CiliumNetworkPolicies.

Before digging into the behavior, let me first provide some definitions or explanations of the various things involved here:

Kubernetes namespaces are a way to logically isolate groups of resources in a cluster. For example, you might install the software that drives your point-of-sale (PoS) devices in the “retail-pos” namespace while the application that handles inventory is in the “inventory” namespace. You can read more about namespaces in the Kubernetes documentation.
CiliumNetworkPolicies are Cilium-specific network policies Continue reading

NAN064: Get to Know NAF Co-Founder: Scott Robohn

Scott Robohn is responsible for so much of the current buzz and awareness of network automation. Today, we sit down with the co-founder of Network Automation Forum to learn about his own journey. We chat about his education and the question of if college degrees are necessary. We also talk about his experience at big... Read more »

Disrupting FlyingYeti’s campaign targeting Ukraine

Cloudforce One is publishing the results of our investigation and real-time effort to detect, deny, degrade, disrupt, and delay threat activity by the Russia-aligned threat actor FlyingYeti during their latest phishing campaign targeting Ukraine. At the onset of Russia’s invasion of Ukraine on February 24, 2022, Ukraine introduced a moratorium on evictions and termination of utility services for unpaid debt. The moratorium ended in January 2024, resulting in significant debt liability and increased financial stress for Ukrainian citizens. The FlyingYeti campaign capitalized on anxiety over the potential loss of access to housing and utilities by enticing targets to open malicious files via debt-themed lures. If opened, the files would result in infection with the PowerShell malware known as COOKBOX, allowing FlyingYeti to support follow-on objectives, such as installation of additional payloads and control over the victim’s system.

Since April 26, 2024, Cloudforce One has taken measures to prevent FlyingYeti from launching their phishing campaign – a campaign involving the use of Cloudflare Workers and GitHub, as well as exploitation of the WinRAR vulnerability CVE-2023-38831. Our countermeasures included internal actions, such as detections and code takedowns, as well as external collaboration with third parties to remove the actor’s cloud-hosted malware. Continue reading

« Previous 1 … 88 89 90 91 92 … 3,793 Next »