Archive

Category Archives for "Networking"

A steam locomotive from 1993 broke my yarn test

So the story begins with a pair programming session I had with my colleague, which I desperately needed because my node skill tree is still at level 1, and I needed to get started with React because I'll be working on our internal backstage instance.

We worked together on a small feature, tested it locally, and it worked. Great. Now it's time to make My Very First React Commit. So I ran the usual git add and git commit, which hooked into yarn test, to automatically run unit tests for backstage, and that's when everything got derailed. For all the React tutorials I have followed, I have never actually run a yarn test on my machine. And the first time I tried yarn test, it hung, and after a long time, the command eventually failed:

Determining test suites to run...

  ● Test suite failed to run

thrown: [Error]

error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
🌈  backstage  ⚡

I could tell it was obviously unhappy about something, and then it threw some [Error]. I have very little actual JavaScript experience, but this looks suspiciously like someone had neglected to Continue reading

Comparing IP and CLNP: Local (Node) Multihoming

Another area where CLNP is a clear winner when compared to the TCP/IP stack is multi-homed nodes (nodes with multiple interfaces, not site multi-homing, where whole networks are connected to two upstream providers).

Multi-homed TCP/IP nodes must have multiple IP addresses because IP uses address interfaces. There is no well-defined procedure in TCP/IP for how a multi-homed node should behave. In the early days of TCP/IP, they tried to address that in RFC 1122 (Host Requirements RFC), but even then, there were two ideas about dealing with multiple interfaces: the strong and weak end system models (more details).

HS099: From CLI to CFO: Translating Complex Network Data into Clear Strategic and Financial Insights (Sponsored)

IT and network leaders need more than uptime—they need to know what their networks cost, what they deliver, and how future changes will impact the business. That’s where Netos comes in. CEO and founder Richard Foster joins Johna and John in a lively discussion to explore how Netos turns complex operational data into clear financial... Read more »

Comparing AI / ML activity from two production networks

AI Metrics describes how to deploy the open source ai-metrics application. The application provides performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter. The screen capture from the article (above) shows results from a simulated 48,000 GPU cluster.

This article goes beyond simulation to demonstrate the AI Metrics dashboard by comparing live traffic seen in two production AI clusters.

Cluster 1

This cluster consists of 250 GPUs connected via 100G ports to single large switch. The results are pretty consistent with simulation from the original article. In this case there is no Core Link Traffic because the cluster consists of a single switch. The Discards chart shows a burst of Out (egress) discards and the Drop Reasons chart gives the reason as ingress_vlan_filter. The Total Traffic, Operations, Edge Link Traffic, and RDMA Operations charts all show a transient drop in throughput coincident with the discard spike. Further details of the dropped packets, such as source/destination address, operation, ingress / egress port, QP pair, etc. can be extracted from the sFlow Dropped Packet Notifications that are populating Continue reading

“You get Instant Purge, and you get Instant Purge!” — all purge methods now available to all customers

There's a tradition at Cloudflare of launching real products on April 1, instead of the usual joke product announcements circulating online today. In previous years, we've introduced impactful products like 1.1.1.1 and 1.1.1.1 for Families. Today, we're excited to continue this tradition by making every purge method available to all customers, regardless of plan type.

During Birthday Week 2024, we announced our intention to bring the full suite of purge methods — including purge by URL, purge by hostname, purge by tag, purge by prefix, and purge everything — to all Cloudflare plans. Historically, methods other than "purge by URL" and "purge everything" were exclusive to Enterprise customers. However, we've been openly rebuilding our purge pipeline over the past few years (hopefully you’ve read some of our blog series), and we're thrilled to share the results more broadly. We've spent recent months ensuring the new Instant Purge pipeline performs consistently under 150 ms, even during increased load scenarios, making it ready for every customer.  

But that's not all — we're also significantly raising the default purge rate limits for Enterprise customers, allowing even greater purge throughput thanks to the efficiency of our Continue reading

NB520: When Good LLMs Do Bad Things, Dell’s Workforce Downsizes and Quantum Key Distribution From Space

Grab a virtual doughnut to blaze through this week’s IT news with Johna Johnson and John Burke as Drew Conry-Murray is enjoying his glazed, filled and sprinkled vacation donuts.  Today, we’re going to talk about getting good LLMs to do bad things, Dell’s workforce downsizing, Cloudflare’s recent outage, some developments in space networking, and more.... Read more »

TNO022: Secure Automation at Enterprise Scale for the Public Sector with Red Hat Ansible (Sponsored)

There are both benefits and challenges when adopting automation in the public sector, but Red Hat Ansible enhances efficiency, security and service delivery. With the right tooling, network operators can integrate automation into existing environments and improve network security.  Providing insights into adopting automation in the public sector are Tony Dubiel, Principal Specialist Solution Architect... Read more »

HN774: Who Put These OT Risks In My IT Ops? Fortinet Has Answers (Sponsored)

IT and infosec professionals are used to operating and protecting mission-critical infrastructure; servers, databases, load balancers, and so on. But what about valves that control the flow of gas or oil in a refinery? Temperature and vibration sensors that monitor industrial manufacturing processes? If you’re thinking “That’s not my problem” think again. There’s a whole... Read more »

Passive BGP Sessions

The Dynamic BGP Peers lab exercise gave you the opportunity to build a large-scale environment in which routers having an approved source IP addresses (usually matching an ACL/prefix list) can connect to a BGP route reflector or route server.

In a more controlled environment, you’d want to define BGP neighbors on the BGP RR/RS but not waste CPU cycles trying to establish BGP sessions with unreachable neighbors. Welcome to the world of passive BGP sessions.

Click here to start the lab in your browser using GitHub Codespaces (or set up your own lab infrastructure). After starting the lab environment, change the directory to session/8-passive and execute netlab up.

N4N019: Howdy, Neighbor! And Other Routing Stuff

In today’s episode, we continue the discussion about routing and routing protocols by focusing on commonalities rather than differences among  protocols such as OSPF, RIP, EIGRP, or BGP. We explain how, in general, routing protocols discover each other, communicate, maintain relationships, and exchange routing information. Next, we explore the topics of selecting best paths in... Read more »

Response: Any-to-Any Connectivity in the Internet

Bob left a lengthy comment arguing with the (somewhat black-and-white) claims I made in the Rise of NAT podcast. Let’s start with the any-to-any connectivity:

From my young millennial point of view, the logic is reversed: it is because of NATs and firewalls that the internet became so asymmetrical (client/server) just like the Minitel was designed (yes, I am French), whereas the Internet (and later the web, although a client/server protocol, was meant for everyone to be a client and a server) was designed to be more balanced.

Let’s start with the early Internet. It had no peer-to-peer applications. It connected a few large computers (mainframes) that could act as servers but also allowed terminal-based user access and thus ran per-user clients.

Three chapters at Cloudflare: Programmer to CTO to Board of Directors

Today, after more than 13 years at the company, I am joining Cloudflare’s board of directors and retiring from my full-time position as CTO.

Back in 2012 I wrote a short post on my personal site simply titled: Programmer. The post announced that I’d recently joined a company called CloudFlare (still sporting that capital “F”) with the job title Programmer. I’d chosen that title in part because it was the very first title I’d ever had, and because it would reflect what I’d be doing at Cloudflare.

I had spent a lot of time working at startups—in technical and then management roles—and wanted to go back to the really technical part that I loved most. Cloudflare gave me that opportunity, and I worked on a lot of systems that make up the Cloudflare that so many people around the world use today.

Looking back on my time at the company it’s really, really hard to pick my top highlights. In 2019 I wrote 6,000 words on the experience of helping build Cloudflare. But here are five that stand out:

Always be shipping

The night we finished the preparation to launch Universal SSL sticks in my memory. We set out to Continue reading

Introducing Calico 3.30: A New Era of Open Source Network Security and Observability for Kubernetes

When we first launched Project Calico in 2016, we set out to make Kubernetes networking easy, reliable, and scalable for all organizations. Our goal was to abstract away the complexity and performance overheads of other CNI plugins while simultaneously extending Kubernetes network policy to make it easier to secure your Kubernetes workloads.

Over the last 9 years, we’ve seen our community grow alongside Calico Open Source, which has become the most widely adopted Kubernetes networking tool that now powers over 8 million nodes across more than 166 countries. We’ve seen the challenges our community has faced as more organizations adopt Kubernetes, and as the scale and complexity of these Kubernetes deployments has increased. Through our commercial offerings, we’ve helped solve networking and network security challenges for some of the world’s largest Kubernetes deployments, from financial institutions to telcos.

Calico OSS 3.30

With the release of Calico OSS 3.30 in May, we are open sourcing our battle-tested observability and security tools from our commercial editions. This includes the following key features:

  • Goldmane – A gRPC-based API for accessing and capturing flow logs and policy evaluation metrics
  • Whisker – A web-based tool for viewing and filtering flow logs to troubleshoot Continue reading

NAN088: See Something, Improve Something – An Iterative Approach to Automation Success

On today’s Network Automation Nerds, industry veteran Michael Bushong talks about lessons learned from failure. As the network industry grapples with automation and network engineers confront yet another cycle of upskilling and grinding out new certs, he warns against executives and practice leads aiming for the biggest, shiniest project. His advice? Find something that matters... Read more »