NetworkingNexus.net

Linux packet sampling using eBPF

Linux 6.11+ kernels provide TCX attachment points for eBPF programs to efficiently examine packets as they ingress and egress the host. The latest version of the open source Host sFlow agent includes support for TCX packet sampling to stream industry standard sFlow telemetry to a central collector for network wide visibility, e.g. Deploy real-time network dashboards using Docker compose describes how to quickly set up a Prometheus database and use Grafana to build network dashboards.

static __always_inline void sample_packet(struct __sk_buff *skb, __u8 direction) {
    __u32 key = skb->ifindex;
    __u32 *rate = bpf_map_lookup_elem(&sampling, &key);
    if (!rate || (*rate > 0 && bpf_get_prandom_u32() % *rate != 0))
        return;

    struct packet_event_t pkt = {};
    pkt.timestamp = bpf_ktime_get_ns();
    pkt.ifindex = skb->ifindex;
    pkt.sampling_rate = *rate;
    pkt.ingress_ifindex = skb->ingress_ifindex;
    pkt.routed_ifindex = direction ? 0 : get_route(skb);
    pkt.pkt_len = skb->len;
    pkt.direction = direction;

    __u32 hdr_len = skb->len < MAX_PKT_HDR_LEN ? skb->len : MAX_PKT_HDR_LEN;
    if (hdr_len > 0 && bpf_skb_load_bytes(skb, 0, pkt.hdr, hdr_len) < 0)
        return;
    bpf_perf_event_output(skb, &events, BPF_F_CURRENT_CPU, &pkt, sizeof(pkt));
}

SEC("tcx/ingress")
int tcx_ingress(struct __sk_buff *skb) {
    sample_packet(skb, 0);

    return TCX_NEXT;
}

SEC("tcx/egress")
int tcx_egress(struct __sk_buff *skb) {
    sample_packet(skb, 1);

    return TCX_NEXT;
}

The sample.bpf.c file Continue reading

Fun Reading: Who is LLM?

Is an LLM a stubborn donkey, a genie, or a slot machine (and why)? Find out in the Who is LLM? article by Martin Fowler.

Worth Reading 072725

We sketch out the enabling technologies for AI. They include search, reasoning, neural networks, natural language processing, signal processing and computer graphics, programming and conventional software engineering, human-computer interaction, communications, and specialized hardware that provides supercomputing power.

For decades, thanks to the low latency enabled by Remote Direct Memory Access, or RDMA, a method of allowing CPUs and then GPUs and finally other kinds of XPUs to directly access the main memory of each other without having to go through the entire network software stack, InfiniBand found a niche and was one of the reasons why Nvidia shelled out $6.9 billion to acquire Mellanox Technologies more than five years ago.

Shipments of tape storage media increased again in 2024, according to HPE, IBM, and Quantum – the three companies that back the Linear Tape-Open (LTO) Format.

In this episode of PING, APNIC’s Chief Scientist, Geoff Huston, discusses a day in the life of Border Gateway Protocol (BGP). Not an extraordinary day, not a special day, just a regular day.

Dumb phones represent the laziest possible solution to a complex behavioral problem. They’re the dietary equivalent of having your jaw wired shut.

Intel Puts The Process Horse Back In Front Of The Foundry Cart

It is beginning to look like Intel plans to milk the impending 18A manufacturing process for a long time. …

Intel Puts The Process Horse Back In Front Of The Foundry Cart was written by Timothy Prickett Morgan at The Next Platform.

ArubaCX: When BGP Soft Reconfiguration Becomes a No-Op

Changing an existing BGP routing policy is always tricky on platforms that apply line-by-line changes to device configurations (Cisco IOS and most other platforms claiming to have industry-standard CLI, with the notable exception of Arista EOS). The safest approach seems to be:

Do not panic when the user makes changes to route maps and underlying filters (prefix lists, AS-path access lists, or community lists).
Let the user decide when they’re done and process the BGP table with the new routing policy at that time.

The White House AI Action Plan: a new chapter in U.S. AI policy

On July 23, 2025, the White House unveiled its AI Action Plan (Plan), a significant policy document outlining the current administration's priorities and deliverables in Artificial Intelligence. This plan emerged after the White House received over 10,000 public comments in response to a February 2025 Request for Information (RFI). Cloudflare’s comments urged the White House to foster conditions for U.S. leadership in AI and support open-source AI, among other recommendations.

There is a lot packed into the three pillar, 28-page Plan.

Pillar I: Accelerate AI Innovation. Focuses on removing regulations, enabling AI adoption and developing, and ensuring the availability of open-source and open-weight AI models.
Pillar II: Build American AI Infrastructure. Prioritizes the construction of high-security data centers, bolstering critical infrastructure cybersecurity, and promoting Secure-by-Design AI technologies.
Pillar III: Lead in International AI Diplomacy and Security. Centers on providing America’s allies and partners with access to AI, as well as strengthening AI compute export control enforcement.

Each of these pillars outlines policy recommendations for various federal agencies to advance the plan’s overarching goals. There’s much that the Plan gets right. Below we cover a few parts of the Plan that we think are particularly important. Continue reading

Testing Arista AVD with GNS3 and EOS

Arista AVD (Architect, Validate, Deploy) – https://avd.arista.com – is a powerful tool that brings network architecture into the world of Infrastructure-as-Code. I wanted to try it out in a lab setting and see how it works in a non-standard environment. Since my go-to lab tool is GNS3 with Arista cEOS images — while the AVD […]

<p>The post Testing Arista AVD with GNS3 and EOS first appeared on IPNET.</p>

Kubernetes Is Powerful, But Not Secure (at least not by default)

Kubernetes has transformed how we deploy and manage applications. It gives us the ability to spin up a virtual data center in minutes, scaling infrastructure with ease. But with great power comes great complexities, and in the case of Kubernetes, that complexity is security.

By default, Kubernetes permits all traffic between workloads in a cluster. This “allow by default” stance is convenient during development, and testing but it’s dangerous in production. It’s up to DevOps, DevSecOps, and cloud platform teams to lock things down.

To improve the security posture of a Kubernetes cluster, we can use microsegmentation, a practice that limits each workload’s network reach so it can only talk to the specific resources it needs. This is an essential security method in today’s cloud-native environments.

Why Is Microsegmentation So Hard?

We all understand that network policies can achieve microsegmentation; or in other words, it can divide our Kubernetes network model into isolated pieces. This is important since Kubernetes is usually used to provide multiple teams with their infrastructural needs or host multiple workloads for different tenants. With that, you would think network policies are first citizens of clusters. However, when we dig into implementing them, three operational challenges Continue reading

Financial Services Firms Will Bank On Homegrown AI Training

Every company in every industry in every geography on Earth is trying to figure out how they are going to train AI models and tune them to help with their particular workloads. …

Financial Services Firms Will Bank On Homegrown AI Training was written by Timothy Prickett Morgan at The Next Platform.

For Now, AI Helps IBM’s Bottom Line More Than Its Top Line

While the hyperscalers and clouds and their AI model builder customers are setting the pace in compute, networking, and storage during the GenAI revolution, that does not mean that they will necessarily provide the only systems that will be used by the largest enterprises in the world. …

For Now, AI Helps IBM’s Bottom Line More Than Its Top Line was written by Timothy Prickett Morgan at The Next Platform.

Google’s Open Lakehouse: The Foundation For Enterprise AI Data

Businesses have always relied on data, but they never were able to get full value out of them when they were siloed by structure, system, or storage. …

Google’s Open Lakehouse: The Foundation For Enterprise AI Data was written by Timothy Prickett Morgan at The Next Platform.

Hedge 275: Jevon’s Paradox

What is Jevon’s Paradox? Tom, Eyvonne, and Russ discuss how this famous paradox impact network engineering.

download

Serverless Statusphere: a walk through building serverless ATProto applications on Cloudflare’s Developer Platform

Social media users are tired of losing their identity and data every time a platform shuts down or pivots. In the ATProto ecosystem — short for Authenticated Transfer Protocol — users own their data and identities. Everything they publish becomes part of a global, cryptographically signed shared social web. Bluesky is the first big example, but a new wave of decentralized social networks is just beginning. In this post I’ll show you how to get started, by building and deploying a fully serverless ATProto application on Cloudflare’s Developer Platform.

Why serverless? The overhead of managing VMs, scaling databases, maintaining CI pipelines, distributing data across availability zones, and securing APIs against DDoS attacks pulls focus away from actually building.

That’s where Cloudflare comes in. You can take advantage of our Developer Platform to build applications that run on our global network: Workers deploy code globally in milliseconds, KV provides fast, globally distributed caching, D1 offers a distributed relational database, and Durable Objects manage WebSockets and handle real-time coordination. Best of all, everything you need to build your serverless ATProto application is available on our free tier, so you can get started without spending a cent. You can find the code in Continue reading

Worth Reading 072425

They call themselves Scattered Spider. They’re probably younger than your college freshman. They live in suburban bedrooms across America and Britain, and they’ve just brought industries to their knees.

The RPKI makes use of RSA signatures. These “traditional” digital signatures are expected to be vulnerable to attacks with powerful quantum computers. While no quantum computer currently exists that can break traditional cryptography, the development of quantum computers is progressing rapidly, and it is expected that they will be able to break RSA and other traditional cryptographic algorithms, be it in several years or several decades.

Analysing Transmission Control Protocol (TCP) SYN segments, the initial step in the TCP three-way handshake, can reveal patterns and anomalies in network traffic, providing insights into potential threats.

One way to establish if a QUIC connection is viable without paying a time penalty is for the server to signal the capability to use QUIC to the client in the first (TCP/TLS) connection, allowing the client to initiate a QUIC session on the second and subsequent connections.

These are not bugs but are inherent limitations of the technology. The same limitations make it unlikely that LLM machines will ever be capable of performing all human tasks Continue reading

Building Jetflow: a framework for flexible, performant data pipelines at Cloudflare

The Cloudflare Business Intelligence team manages a petabyte-scale data lake and ingests thousands of tables every day from many different sources. These include internal databases such as Postgres and ClickHouse, as well as external SaaS applications such as Salesforce. These tasks are often complex and tables may have hundreds of millions or billions of rows of new data each day. They are also business-critical for product decisions, growth plannings, and internal monitoring. In total, about 141 billion rows are ingested every day.

As Cloudflare has grown, the data has become ever larger and more complex. Our existing Extract Load Transform (ELT) solution could no longer meet our technical and business requirements. After evaluating other common ELT solutions, we concluded that their performance generally did not surpass our current system, either.

It became clear that we needed to build our own framework to cope with our unique requirements — and so Jetflow was born.

What we achieved

Over 100x efficiency improvement in GB-s:

Our longest running job with 19 billion rows was taking 48 hours using 300 GB of memory, and now completes in 5.5 hours using 4 GB of memory
We estimate that ingestion of Continue reading

Ultra Ethernet: Reinventing X.25

One should never trust the technical details published by the industry press, but assuming the Tomahawk Ultra puff piece isn’t too far off the mark, the new Broadcom ASIC (supposedly loosely based on emerging Ultra Ethernet specs):

Uses Optimized Ethernet Header, replacing IP/UDP header with a 10-byte something (let’s call it session identifier)
Makes Ethernet lossless with hop-by-hop retransmission/error recovery
Uses credit-based flow control (the receiver continuously updates the sender about the amount of available space)

If you’re ancient enough, you might recognize #3 as part of Fibre Channel, #2 and #3 as part of IEEE 802.1 LLC2 (used by IBM to implement SNA over Token Ring and Ethernet), and all three as the fundamental ideas of X.25 that Broadcom obviously reinvented at 800 Gbps speeds, proving (yet again) RFC 1925 Rule 11.

Don’t Let AI Make You Circuit City

I have a little confession. Sometimes I like to go into Best Buy and just listen. I pretend to be shopping or modem bearings or a left handed torque wrench. What I’m really doing is hearing how people sell computers. I remember when 8x CD burners were all the rage. I recall picking one particular machine because it had an integrated Sound Blaster card. Today, I just marvel at how the associates rattle off a long string of impressive sounding nonsense that consumers will either buy hook, line, and sinker or refute based on some Youtube reviewer recommendation. Every once in a while, though, I hear someone that actually does understand the lingo and it is wonderful. They listen and understand the challenges and don’t sell a $3,000 gaming computer to a grandmother just to play Candy Crush and look up grandkid photos on Facebook.

The Experience Matters

What does that story have to do with the title of this post? Well, dear young readers, you may not remember the time when Best Buy Blue was locked in mortal competition with Circuit City Red. In a time before Amazon was ascendant you had to pick between the two giants of Continue reading

Cloudflare protects against critical SharePoint vulnerability, CVE-2025-53770

On July 19, 2025, Microsoft disclosed CVE-2025-53770, a critical zero-day Remote Code Execution (RCE) vulnerability. Assigned a CVSS 3.1 base score of 9.8 (Critical), the vulnerability affects SharePoint Server 2016, 2019, and the Subscription Edition, along with unsupported 2010 and 2013 versions. Cloudflare’s WAF Managed Rules now includes 2 emergency releases that mitigate these vulnerabilities for WAF customers.

Unpacking CVE-2025-53770

The vulnerability's root cause is improper deserialization of untrusted data, which allows a remote, unauthenticated attacker to execute arbitrary code over the network without any user interaction. Moreover, what makes CVE-2025-53770 uniquely threatening is its methodology – the exploit chain, labeled "ToolShell." ToolShell is engineered to play the long-game: attackers are not only gaining temporary access, but also taking the server's cryptographic machine keys, specifically the ValidationKey and DecryptionKey. Possessing these keys allows threat actors to independently forge authentication tokens and __VIEWSTATE payloads, granting them persistent access that can survive standard mitigation strategies such as a server reboot or removing web shells.

In response to the active nature of these attacks, the U.S. Cybersecurity and Infrastructure Security Agency (CISA) added CVE-2025-53770 to its Known Exploited Vulnerabilities (KEV) catalog with an emergency remediation deadline. Continue reading

Shutdown season: the Q2 2025 Internet disruption summary

Cloudflare’s network currently spans more than 330 cities in over 125 countries, and we interconnect with over 13,000 network providers in order to provide a broad range of services to millions of customers. The breadth of both our network and our customer base provides us with a unique perspective on Internet resilience, enabling us to observe the impact of Internet disruptions at both a local and national level, as well as at a network level.

As we have noted in the past, this post is intended as a summary overview of observed and confirmed disruptions, and is not an exhaustive or complete list of issues that have occurred during the quarter. A larger list of detected traffic anomalies is available in the Cloudflare Radar Outage Center. Note that both bytes-based and request-based traffic graphs are used within the post to illustrate the impact of the observed disruptions — the choice of metric was generally made based on which better illustrated the impact of the disruption.

In our Q1 2025 summary post, we noted that we had not observed any government-directed Internet shutdowns during the quarter. Unfortunately, that forward progress was short-lived — in the second quarter of 2025, we Continue reading

Bell Labs Takes A Topological Approach To Quantum 2.0

Momentum is building for quantum computing and some observers say that a usable, fault-tolerant quantum system could appear in the next few years. …

Bell Labs Takes A Topological Approach To Quantum 2.0 was written by Jeffrey Burt at The Next Platform.

« Previous 1 2 3 4 … 3,798 Next »