It has been clear for some time that Japan wants to have a certain amount of economic and technical independence when it comes to cloud computing in the Land of the Rising Sun. …
Today’s episode is all about high-performance memory in switches. We dig into the differences among TCAM, SRAM, DRAM, and HBM, and all the complex tradeoffs that go into allocating memory resources to networking functions. If you’ve ever had to select a Switching Database Manager template or done similar operations on a switch, this is your... Read more »
For over two decades, we've built real-time communication on the Internet using a patchwork of specialized tools. RTMP gave us ingest. HLS and DASH gave us scale. WebRTC gave us interactivity. Each solved a specific problem for its time, and together they power the global streaming ecosystem we rely on today.
But using them together in 2025 feels like building a modern application with tools from different eras. The seams are starting to show—in complexity, in latency, and in the flexibility needed for the next generation of applications, from sub-second live auctions to massive interactive events. We're often forced to make painful trade-offs between latency, scale, and operational complexity.
Today Cloudflare is launching the first Media over QUIC (MoQ) relay network, running on every Cloudflare server in datacenters in 330+ cities. MoQ is an open protocol being developed at the IETF by engineers from across the industry—not a proprietary Cloudflare technology. MoQ combines the low-latency interactivity of WebRTC, the scalability of HLS/DASH, and the simplicity of a single architecture, all built on a modern transport layer. We're joining Meta, Google, Cisco, and others in building implementations that work seamlessly together, creating a shared foundation for the next generation of real-time Continue reading
On today’s Total Network Operations we talk through the adoption of AI in network operations with John Capobianco, Head of DevRel at Selector. Selector is the sponsor of today’s episode. John walks us through his career journey as a network engineer, and describes the moment where he realized that AI was going to change how... Read more »
From time to time, I like to dive into the archive and find a show that’s worth repeating. Forthwith, Derrick Winkworth and automation.
Network automation efforts tend to focus on building and maintaining configurations–but is this the right place to be putting our automation efforts? Derick Winkworth joins Tom Ammon and Russ White at the Hedge for a conversation about what engineers really do, and what this means for automation.
When I was cleaning the “set BGP MED” integration test, I decided that once a BGP prefix is in the BGP table of the BGP peer, there’s no need for a further wait before checking its MED value. After all:
We configure an outbound routing policy to change MED;
We execute do clear bgp * soft out at the end of most BGP policy configuration templates1
The device under test should thus immediately (re)send the expected BGP prefix with the target MED.
That approach failed miserably with ArubaCX; it was time to investigate the details.
On August 21, 2025, an influx of traffic directed toward clients hosted in the Amazon Web Services (AWS) us-east-1 facility caused severe congestion on links between Cloudflare and AWS us-east-1. This impacted many users who were connecting to or receiving connections from Cloudflare via servers in AWS us-east-1 in the form of high latency, packet loss, and failures to origins.
Customers with origins in AWS us-east-1 began experiencing impact at 16:27 UTC. The impact was substantially reduced by 19:38 UTC, with intermittent latency increases continuing until 20:18 UTC.
This was a regional problem between Cloudflare and AWS us-east-1, and global Cloudflare services were not affected. The degradation in performance was limited to traffic between Cloudflare and AWS us-east-1. The incident was a result of a surge of traffic from a single customer that overloaded Cloudflare's links with AWS us-east-1. It was a network congestion event, not an attack or a BGP hijack.
We’re very sorry for this incident. In this post, we explain what the failure was, why it occurred, and what we’re doing to make sure this doesn’t happen again.
Background
Cloudflare helps anyone to build, connect, protect, and accelerate their websites on the Internet. Most customers host their Continue reading
On July 31, 2025, just as Portugal entered the peak of another intense wildfire season, João Pina, also known as Tomahock, received an automated alert from Cloudflare. His volunteer-run project, fogos.pt, now a trusted source of real-time wildfire information for millions across Portugal, was under attack.
One of the several alerts fogos.pt received related to the DDoS attack
What started in 2015 as a late-night side project with friends around a dinner table in Aveiro has grown into a critical public resource. During wildfires, the site is where firefighters, journalists, citizens, and even government agencies go to understand what’s happening on the ground. Over the years, fogos.pt has evolved from parsing PDFs into visual maps to a full-featured app and website with historical data, weather overlays, and more. It’s also part of Project Galileo, Cloudflare’s initiative to protect vulnerable but important public interest sites at no cost.
Wildfires are not just a Portuguese challenge. They are frequent across southern Europe (Spain, Greece, currently also under alert), California, Australia, and in Canada, which in 2023 faced record-setting fires. In all these cases, reliable information can be crucial, sometimes life-saving. Other organizations offering similar public services can Continue reading
Doubling the transistor count every two years and therefore cutting the price of a transistor in half because you can cram twice as many on a given area transformed computing and drove it during the CMOS chip era. …
SPONSORED POST: As organizations race to harness the potential of AI, many are discovering that their existing data architectures are struggling to keep up. …
A large number of vendors claim to use industry-standard CLI, which means “something that looks like Cisco IOS, but we can’t say that in public.” The implementations of that “standard” are full of quirks; as I was making fun of Cisco IOS last week, it’s only fair to look at how others deal with BGP community propagation.
netlab has BGP configuration templates for 14 different platforms1, including these implementations that look like Cisco IOS from a distance if you squint just right2: Arista EOS, Aruba CX, and FRRouting. You can check the configuration templates if you wish; here’s the TC&DB3 overview:
Democratizing the learning environment is a passion for Deepak Ahuja. So much so, he founded CloudMyLab, a company that provides hands-on, cloud-based labs and networking environments. His goal is to offer an affordable lab-as-a-service for two groups of people: network engineers seeking certifications, and network engineers and automators that need a place to safely test... Read more »
During Developer Week 2024, we introduced AI face cropping in private beta. This feature automatically crops images around detected faces, and marks the first release in our upcoming suite of AI image manipulation capabilities.
AI face cropping is now available in Images for everyone. To bring this feature to general availability, we moved our CPU-based prototype to a GPU-based implementation in Workers AI, enabling us to address a number of technical challenges, including memory leaks that could hamper large-scale use.
We developed face cropping with two particular use cases in mind:
Social media platforms and AI chatbots. We observed a lot of traffic from customers who use Images to turn unedited images of people into smaller profile pictures in neat, fixed shapes.
E-commerce platforms. The same product photo might appear in a grid of thumbnails on a gallery page, then again on an individual product page with a larger view. The following example illustrates how cropping can change the emphasis from the model’s shirt to their sunglasses.
In this unplanned and unfiltered conversation, we dive deep into network automation realities with Ivan Pepelnjak, networking’s long standing and independent voice from ipSpace.net. We explore why automation projects fail, dissect the tooling landscape (Ansible vs. Terraform vs. Python), and discuss the cultural barriers preventing enterprises from modernizing their networks. Ivan delivers hard truths about... Read more »
The SwiNOG 40 event started with an interesting presentation on Building Trustworthy Network Automation (video) by Damien Garros (now CEO @ OpsMill) who discussed the principles one can use to build a trustworthy network automation solution, including idempotency, dry runs, and transactional changes. He also covered the crucial roles of the declarative approach, version control, and testing.
If you have ever watched any of my network automation materials, you won’t be surprised by anything he said, but if you’re just starting your network automation journey, you MUST watch this presentation to get your bearings straight.
Years before Amazon Web Services launched in March 2006, there were a slew of grid computing startups and incumbent system makers – and a few of them with deep supercomputing experience – that were hawking remotely accessible, utility-style computing on demand. …
Microsoft is rethinking allowing endpoint security software to run in the Windows kernel (including third-party and Microsoft’s own endpoint security software). While there are benefits to running security software in the kernel, there are also serious downsides (see the CrowdStrike outage). Dan Massameno joins JJ and Drew on Packet Protector to talk about the role... Read more »
PARTNER CONTENT: For years, data science and engineering teams have faced a familiar challenge: turning vast, messy datasets into timely, reliable insights. …
Google now estimates that the specs for a Cryptographically Relevant Quantum Computer (CRQC), which can break conventional public key encryption in a useful amount of time, are lower than they had previously estimated…by 95%. Given the breadth and pace of advancement in quantum computing, this makes the advent of the CRQC likely to happen years... Read more »