How we used OpenBMC to support AI inference on GPUs around the world

Cloudflare recently announced Workers AI, giving developers the ability to run serverless GPU-powered AI inference on Cloudflare’s global network. One key area of focus in enabling this across our network was updating our Baseboard Management Controllers (BMCs). The BMC is an embedded microprocessor that sits on most servers and is responsible for remote power management, sensors, serial console, and other features such as virtual media.

To efficiently manage our BMCs, Cloudflare leverages OpenBMC, an open-source firmware stack from the Open Compute Project (OCP). For Cloudflare, OpenBMC provides transparent, auditable firmware. Below describes some of what Cloudflare has been able to do so far with OpenBMC with respect to our GPU-equipped servers.

Ouch! That’s HOT!

For this project, we needed a way to adjust our BMC firmware to accommodate new GPUs, while maintaining the operational efficiency with respect to thermals and power consumption. OpenBMC was a powerful tool in meeting this objective.

OpenBMC allows us to change the hardware of our existing servers without the dependency of our Original Design Manufacturers (ODMs), consequently allowing our product teams to get started on products quickly. To physically support this effort, our servers need to be able to supply enough power and keep Continue reading

How to Interview a Network Engineer Using a Single Scenario

Most organizations are terribly bad at interviewing people. They overcomplicate things by holding too many interviews (more than 2-3) and often focus their interview on trivia and memorization rather than walking through a scenario. Every interview should have some form of a scenario and a whiteboard if you are hiring a Network Engineer. Rather than overcomplicating things, here’s how you can interview someone using a single scenario that you can expand on and go to different depths at different stages depending on the focus of the role.

Scenario:

You are an employee working in a large campus network. Your computer has just started up and has not previously communicated with anything before you open your browser and type in microsoft.com.

Before any communication can take place, you need an IP address. What IP protocols are there? What are the main differences between the two?

Things to look for: IPv4 vs IPv6. ARP vs ND. DHCP vs RA. Broadcast vs multicast.

What methods are there of configuring an IP address?

Things to look for: Static IP vs DHCP vs RA.

When I need to communicate to something external, traffic goes through a gateway. What type of device would Continue reading

HN712: FortiGuard Security Services: Invisible Operations, Tangible Results (Sponsored)

Today we’re talking security, but security you don’t always see. Fortinet, today’s sponsor, has millions of devices in the field. These are real-world devices seeing real-world traffic, all day, everyday. While those devices have a primary protection role, they can also serve as sensors that collect threat signals and feed threat intelligence services that can,... Read more »

HN712: FortiGuard Security Services: Invisible Operations, Tangible Results (Sponsored)

Fortinet turns its on-prem and cloud security devices into a sensor network that collects threat intelligence across the globe. That intelligence then feeds those devices and services with new updates and the latest protections. In today's sponsored Heavy Networking, we talk with Fortinet about its Fortiguard Security Services, how they work, and how customers can take advantage of them.

The post HN712: FortiGuard Security Services: Invisible Operations, Tangible Results (Sponsored) appeared first on Packet Pushers.

HS060 Power Micro-Generation for Data Center

Powering data centres is big deal in current decade. Massive increases in consumption and scaling of off-prem clouds has exceeded the capacity of civilian power grids while cloud operators are reluctant to sign thirty year supply agreements so that more power plants can be built. Enter power micro-generation where large DCs needs too small power supply.

The post HS060 Power Micro-Generation for Data Center appeared first on Packet Pushers.

A Platform For Securely Scaling Operations At The Edge

COMMISSIONED: Innovation at the edge is happening at light speed. Everywhere you turn, organizations are seeking to shift their center of data processing gravity from central locations like head offices and datacenters to the outer limits of the operation – to factory floors, hospital wards, truck fleets and smart cities.

The post A Platform For Securely Scaling Operations At The Edge first appeared on The Next Platform.

A Platform For Securely Scaling Operations At The Edge was written by Martin Courtney at The Next Platform.

Latest copyright decision in Germany rejects blocking through global DNS resolvers

This post is also available in Deutsch.

A recent decision from the Higher Regional Court of Cologne in Germany marked important progress for Cloudflare and the Internet in pushing back against misguided attempts to address online copyright infringement through the DNS system. In early November, the Court in Universal v. Cloudflare issued its decision rejecting a request to require public DNS resolvers like Cloudflare’s 1.1.1.1. to block websites based on allegations of online copyright infringement. That’s a position we’ve long advocated, because blocking through public resolvers is ineffective and disproportionate, and it does not allow for much-needed transparency as to what is blocked and why.

What is a DNS resolver?

To see why the Universal decision matters, it’s important to understand what a public DNS resolver is, and why it’s not a good place to try to moderate content on the Internet.

The DNS system translates website names to IP addresses, so that Internet requests can be routed to the correct location. At a high-level, the DNS system consists of two parts. On one side sit a series of nameservers (Root, TLD, and Authoritative) that together store information mapping domain names to IP addresses; on the other Continue reading

netlab 1.7.0: Lab Validation, Fabrics, BGP Nerd Knobs

It’s been a while since the last netlab release. Most of that time was spent refactoring stuff that you don’t care about, but you might like these features:

As always, we also improved the platform support:

netlab 1.7.0: Lab Validation, Fabrics, BGP Nerd Knobs

It’s been a while since the last netlab release. Most of that time was spent refactoring stuff that you don’t care about, but you might like these features:

As always, we also improved the platform support:

The Bespoke Supercomputing Architecture That Stood the Test of Time

In the history of computing, there has been an endless push and pull between the need for general-purpose versus fine-tuned custom systems and software.

The post The Bespoke Supercomputing Architecture That Stood the Test of Time first appeared on The Next Platform.

The Bespoke Supercomputing Architecture That Stood the Test of Time was written by Nicole Hemsoth Prickett at The Next Platform.

Tech Bytes: Pliant Combines APIs, Low Code Approach For Network Automation (Sponsored)

Network automation takes a variety of forms, from individual scripts that handle specific tasks, to workflows that have to be orchestrated across multiple devices and systems. Today on the Tech Bytes podcast we talk with sponsor Pliant about its automation platform. Pliant helps you orchestrate across devices and domains with a low-code approach that uses... Read more »

Tech Bytes: Pliant Combines APIs, Low Code Approach For Network Automation (Sponsored)

Today on the Tech Bytes podcast we talk with sponsor Pliant about its automation platform. Pliant helps you orchestrate across devices and domains with a low-code approach that uses APIs to automate and orchestrate across your infrastructure.

The post Tech Bytes: Pliant Combines APIs, Low Code Approach For Network Automation (Sponsored) appeared first on Packet Pushers.

How AWS Can Undercut Nvidia With Homegrown AI Compute Engines

Amazon Web Services may not be the first of the hyperscalers and cloud builders to create its own custom compute engines, but it has been hot on the heels of Google, which started using its homegrown TPU accelerators for AI workloads in 2015.

The post How AWS Can Undercut Nvidia With Homegrown AI Compute Engines first appeared on The Next Platform.

How AWS Can Undercut Nvidia With Homegrown AI Compute Engines was written by Timothy Prickett Morgan at The Next Platform.