We've been relying on ML and AI for our core services like Web Application Firewall (WAF) since the early days of Cloudflare. Through this journey, we've learned many lessons about running AI deployments at scale, and all the tooling and processes necessary. We recently launched Workers AI to help abstract a lot of that away for inference, giving developers an easy way to leverage powerful models with just a few lines of code. In this post, we’re going to explore some of the lessons we’ve learned on the other side of the ML equation: training.
Cloudflare has extensive experience training models and using them to improve our products. A constantly-evolving ML model drives the WAF attack score that helps protect our customers from malicious payloads. Another evolving model power bot management product to catch and prevent bot attacks on our customers. Our customer support is augmented by data science. We build machine learning to identify threats with our global network. To top it all off, Cloudflare is delivering machine learning at unprecedented scale across our network.
Each of these products, along with many others, has elevated ML models — including experimentation, training, and deployment — to a crucial position within Continue reading
On November 22nd, 2023, AMS-IX, one of the largest Internet exchanges in Europe, experienced a significant performance drop lasting more than four hours. While its peak performance is around 10 Tbps, it dropped to about 2.1 Tbps during the outage.
AMS-IX published a very sanitized and diplomatic post-mortem incident summary in which they explained the outage was caused by LACP leakage. That phrase should be a red flag, but let’s dig deeper into the details.
SPONSORED POST: Edge security is a growing headache. The attack surface is expanding as more operational functions migrate out of centralized locations and into distributed sites and devices. …
The post Locking down the edge first appeared on The Next Platform.
Locking down the edge was written by Martin Courtney at The Next Platform.
There is nothing quite like great hardware to motivate people to create and tune software to take full advantage of it during a boom time. …
The post AMD Is The Undisputed Datacenter GPU Performance Champ – For Now first appeared on The Next Platform.
AMD Is The Undisputed Datacenter GPU Performance Champ – For Now was written by Timothy Prickett Morgan at The Next Platform.
Azure Boost is a hardware offload of Azure virtual machines designed to improve VM performance. On today's Day Two Cloud we dig into how it works. We also talk about how to implement security in Virtual Network Manager, as well as how to optimize your Azure observability--meaning, how not to blow up your budget with unnecessary logging.
The post D2C223: Accelerating VM Performance With Azure Boost appeared first on Packet Pushers.
Note: We will be updating this story with more information once our contacts in China are awake:
China is using a domestic processor as the backbone for double the performance of the Tianhe-2 system, which topped the Top 500 starting in 2013 and running through late 2015 before being overshadowed by the Sunway system in recent years. …
The post What’s Inside China’s New Homegrown “Tianhe Xingyi” Supercomputer? first appeared on The Next Platform.
What’s Inside China’s New Homegrown “Tianhe Xingyi” Supercomputer? was written by Nicole Hemsoth Prickett at The Next Platform.
SPONSORED FEATURE: So, you’re finally ready to jump on the AI bandwagon. …
The post Accelerate time to insight for AI and HPC first appeared on The Next Platform.
Accelerate time to insight for AI and HPC was written by Martin Courtney at The Next Platform.
Cloudflare recently announced Workers AI, giving developers the ability to run serverless GPU-powered AI inference on Cloudflare’s global network. One key area of focus in enabling this across our network was updating our Baseboard Management Controllers (BMCs). The BMC is an embedded microprocessor that sits on most servers and is responsible for remote power management, sensors, serial console, and other features such as virtual media.
To efficiently manage our BMCs, Cloudflare leverages OpenBMC, an open-source firmware stack from the Open Compute Project (OCP). For Cloudflare, OpenBMC provides transparent, auditable firmware. Below describes some of what Cloudflare has been able to do so far with OpenBMC with respect to our GPU-equipped servers.
For this project, we needed a way to adjust our BMC firmware to accommodate new GPUs, while maintaining the operational efficiency with respect to thermals and power consumption. OpenBMC was a powerful tool in meeting this objective.
OpenBMC allows us to change the hardware of our existing servers without the dependency of our Original Design Manufacturers (ODMs), consequently allowing our product teams to get started on products quickly. To physically support this effort, our servers need to be able to supply enough power and keep Continue reading
Most organizations are terribly bad at interviewing people. They overcomplicate things by holding too many interviews (more than 2-3) and often focus their interview on trivia and memorization rather than walking through a scenario. Every interview should have some form of a scenario and a whiteboard if you are hiring a Network Engineer. Rather than overcomplicating things, here’s how you can interview someone using a single scenario that you can expand on and go to different depths at different stages depending on the focus of the role.
Scenario:
You are an employee working in a large campus network. Your computer has just started up and has not previously communicated with anything before you open your browser and type in microsoft.com.
Before any communication can take place, you need an IP address. What IP protocols are there? What are the main differences between the two?
Things to look for: IPv4 vs IPv6. ARP vs ND. DHCP vs RA. Broadcast vs multicast.
What methods are there of configuring an IP address?
Things to look for: Static IP vs DHCP vs RA.
When I need to communicate to something external, traffic goes through a gateway. What type of device would Continue reading
In the previous BGP labs, we built a network with two adjacent BGP routers and a larger transit network using IBGP. Now let’s make our transit network scalable with BGP route reflectors, this time using a slightly larger network:
Fortinet turns its on-prem and cloud security devices into a sensor network that collects threat intelligence across the globe. That intelligence then feeds those devices and services with new updates and the latest protections. In today's sponsored Heavy Networking, we talk with Fortinet about its Fortiguard Security Services, how they work, and how customers can take advantage of them.
The post HN712: FortiGuard Security Services: Invisible Operations, Tangible Results (Sponsored) appeared first on Packet Pushers.
This week Nvidia shared details about upcoming updates to its platform for building, tuning, and deploying generative AI models. …
The post Finding NeMo Features for Fresh LLM Building Boost first appeared on The Next Platform.
Finding NeMo Features for Fresh LLM Building Boost was written by Nicole Hemsoth Prickett at The Next Platform.
Powering data centres is big deal in current decade. Massive increases in consumption and scaling of off-prem clouds has exceeded the capacity of civilian power grids while cloud operators are reluctant to sign thirty year supply agreements so that more power plants can be built. Enter power micro-generation where large DCs needs too small power supply.
The post HS060 Power Micro-Generation for Data Center appeared first on Packet Pushers.