Embedded function calling in Workers AI: easier, smarter, faster

Introducing embedded function calling and a new ai-utils package

Today, we’re excited to announce a novel way to do function calling that co-locates LLM inference with function execution, and a new ai-utils package that upgrades the developer experience for function calling.

This is a follow-up to our mid-June announcement for traditional function calling, which allows you to leverage a Large Language Model (LLM) to intelligently generate structured outputs and pass them to an API call. Function calling has been largely adopted and standardized in the industry as a way for AI models to help perform actions on behalf of a user.

Our goal is to make building with AI as easy as possible, which is why we’re introducing a new @cloudflare/ai-utils npm package that allows developers to get started quickly with embedded function calling. These helper tools drastically simplify your workflow by actually executing your function code and dynamically generating tools from OpenAPI specs. We’ve also open-sourced our ai-utils package, which you can find on GitHub. With both embedded function calling and our ai-utils, you’re one step closer to creating intelligent AI agents, and from there, the possibilities are endless.

Why Cloudflare’s AI platform?

OpenAI has been the gold Continue reading

NVIDIA Air Infrastructure Simulation Platform

I recently tested the NVIDIA Air Infrastructure Simulation Platform and would like to share my first experiences with you. What is NVIDIA Air Infrastructure Simulation Platform or NVIDIA Air? In a nutshell, NVIDIA Air is a cloud-hosted, data center simulation platform. Where you can: Test and validate network configurations, features, and automation code. Build your own data center topology or choose from an impressive list of pre-built topologies. You can use Cumulus Linux or SONiC as network operating system, add Ubuntu nodes, and more. Import / Export lab topologies. Share the…

The post NVIDIA Air Infrastructure Simulation Platform appeared first on AboutNetworks.net.

AI/ML Networking Part I: RDMA Basics

Remote Direct Memory Access - RDMA Basics


Introduction

Remote Direct Memory Access (RDMA) architecture enables efficient data transfer between Compute Nodes (CN) in a High-Performance Computing (HPC) environment. RDMA over Converged Ethernet version 2 (RoCEv2) utilizes a routed IP Fabric as a transport network for RDMA messages. Due to the nature of RDMA packet flow, the transport network must provide lossless, low-latency packet transmission. The RoCEv2 solution uses UDP in the transport layer, which does not handle packet losses caused by network congestion (buffer overflow on switches or on a receiving Compute Node). To avoid buffer overflow issues, Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) are used as signaling mechanisms to react to buffer threshold violations by requesting a lower packet transfer rate.

Before moving to RDMA processes, let’s take a brief look at our example Compute Nodes. Figure 1-1 illustrates our example Compute Nodes (CN). Both Client and Server CNs are equipped with one Graphical Processing Unit (GPU). The GPU has a Network Interface Card (NIC) with one interface. Additionally, the GPU has Device Memory Units to which it has a direct connection, bypassing the CPU. In real life, a CN may have several GPUs, each with multiple memory units. Intra-GPU communication within the CN happens over high-speed NVLinks. The connection to remote CNs occurs over the NIC, which has at least one high-speed uplink port/interface.

Figure 1-1 also shows the basic idea of a stacked Fine-Grained 3D DRAM (FG-DRAM) solution. In our example, there are four vertically interconnected DRAM dies, each divided into eight Banks. Each Bank contains four memory arrays, each consisting of rows and columns that contain memory units (transistors whose charge indicates whether a bit is set to 1 or 0). FG-DRAM enables cross-DRAM grouping into Ranks, increasing memory capacity and bandwidth.

The upcoming sections introduce the required processes and operations when the Client Compute Node wants to write data from its device memory to the Server Compute Node’s device memory. I will discuss the design models and requirements for lossless IP Fabric in later chapters.



Figure 1-1: Fine-Grained DRAM High-Level Architecture.
Continue reading

DNS Evolution

The DNS is a crucial part of the Internet's architecture. However, the DNS is not a rigid and unchanging technology. It has changed considerably over the lifetime of the Internet and here I’d like to look at what’s changed and what’s remained the same.

Explore and Fix BGP Wedgies

RFC 4264 defines BGP wedgies as “a class of BGP configurations for which there is more than one potential outcome, and where forwarding states other than the intended state are equally stable.” Even worse, “the stable state where BGP converges may be selected by BGP in a non-deterministic manner.

Want to know more? You can explore a real-life BGP wedgie and fix it in the latest BGP lab exercise.

Automatically replacing polyfill.io links with Cloudflare’s mirror for a safer Internet

polyfill.io, a popular JavaScript library service, can no longer be trusted and should be removed from websites.

Multiple reports, corroborated with data seen by our own client-side security system, Page Shield, have shown that the polyfill service was being used, and could be used again, to inject malicious JavaScript code into users’ browsers. This is a real threat to the Internet at large given the popularity of this library.

We have, over the last 24 hours, released an automatic JavaScript URL rewriting service that will rewrite any link to polyfill.io found in a website proxied by Cloudflare to a link to our mirror under cdnjs. This will avoid breaking site functionality while mitigating the risk of a supply chain attack.

Any website on the free plan has this feature automatically activated now. Websites on any paid plan can turn on this feature with a single click.

You can find this new feature under Security ⇒ Settings on any zone using Cloudflare.

Contrary to what is stated on the polyfill.io website, Cloudflare has never recommended the polyfill.io service or authorized their use of Cloudflare’s name on their website. We have asked them to remove the Continue reading

Cloudflare incident on June 20, 2024

On Thursday, June 20, 2024, two independent events caused an increase in latency and error rates for Internet properties and Cloudflare services that lasted 114 minutes. During the 30-minute peak of the impact, we saw that 1.4 - 2.1% of HTTP requests to our CDN received a generic error page, and observed a 3x increase for the 99th percentile Time To First Byte (TTFB) latency.

These events occurred because:

  1. Automated network monitoring detected performance degradation, re-routing traffic suboptimally and causing backbone congestion between 17:33 and 17:50 UTC
  2. A new Distributed Denial-of-Service (DDoS) mitigation mechanism deployed between 14:14 and 17:06 UTC triggered a latent bug in our rate limiting system that allowed a specific form of HTTP request to cause a process handling it to enter an infinite loop between 17:47 and 19:27 UTC

Impact from these events were observed in many Cloudflare data centers around the world.

With respect to the backbone congestion event, we were already working on expanding backbone capacity in the affected data centers, and improving our network mitigations to use more information about the available capacity on alternative network paths when taking action. In the remainder of this blog post, we will go into Continue reading

Looking for a Simple Multihop EBGP Use Case

I plan to add several challenge labs using multihop EBGP sessions to the BGP labs project, including:

  1. Running BGP between VMs and central BGP route servers
  2. Using multihop EBGP session to send full Internet routing table to a customer without overloading the PE-router
  3. Running EBGP EVPN session between loopbacks advertised with EBGP IPv4 session (🤢)

However, I would love to start with a simple use case to help engineers unfamiliar with BGP realize when they might have to use multihop EBGP sessions. Unfortunately, I can’t find one, and the scenarios where I used multihop EBGP in the past (EBGP load balancing and using a low-end router in the EBGP path, where I was effectively using the reverse application of #2 as a customer) are mostly irrelevant.

Would you have an easy-to-understand use case that is best solved with a multihop EBGP session? Please share it in the comments. Thanks a million!

How to Best Use Panorama Templates and Stacks?

How to Best Use Panorama Templates and Stacks?

I’ve been working with Palo Alto Firewalls and Panorama for a few years now, yet the best ways to use Templates still seem somewhat mysterious. I bet many of you feel the same way. Since every network is unique, there isn’t one “right” way to manage this. In this blog post, I’ll break down what Templates and Template Stacks are in Panorama and share some effective strategies for organizing them. Let’s dive in.

A Quick Note on Panorama

If you’re new to Panorama, it’s a centralized management tool that simplifies managing multiple Palo Alto firewalls from a single place. There are two key concepts in Panorama which are Device Groups and Templates. Device Groups manage the configurations you’d usually find under the Policies and Objects tabs on the firewall, while Templates manage with configurations from the Network and Device tabs.

How to Best Use Panorama Templates and Stacks?

It’s important to note that Device Groups and Templates serve different purposes and manage different parts of the configurations. This blog post will focus exclusively on Templates. If you need a refresher on Device Groups and Templates, I’ve covered that in a previous post. Feel free to check it out here for a quick recap.

Ruminations About Europe’s “Alice Recoque” Exascale Supercomputer

Designing chips and shepherding them through the foundry and package and assembly is a complex and difficult process, and not having these skills at a national level has profound implications for the competitiveness of those nations.

Ruminations About Europe’s “Alice Recoque” Exascale Supercomputer was written by Timothy Prickett Morgan at The Next Platform.

Kubernetes network policies: 4 pain points and how to address them

Kubernetes is used everywhere, from test environments to the most critical production foundations that we use daily, making it undoubtedly a de facto in cloud computing. While this is great news for everyone who works with, administers, and expands Kubernetes, the downside is that it makes Kubernetes a favorable target for malicious actors.

Malicious actors typically exploit flaws in the system to gain access to a portion of the environment. They then chain these flaws together to move laterally within the environment, ultimately seeking root access or access to critical information.

While the best way to fix security flaws in any software is to patch it with appropriate fixes that the project maintainers publish, there are certain security practices that you can adopt to fortify your environment, like using network policies. However, most people find network policies complex and overwhelming, which discourages them from implementing policies in their environment.

In this blog post, we will examine four pain points that people face when they want to implement network policies and provide solutions to help you effectively secure your Kubernetes environment.

What is a network policy and why should I use it?

In Kubernetes, a network policy (KNP) resource is the Continue reading

HS076: Greg’s Finale

This is Greg’s last Heavy Strategy episode before he heads off to retirement. He gives us his final pieces of career and life advice, opinions on private equity, and a Cookie Monster quote. We also briefly introduce John Burke, the new Heavy Strategy co-host. Farewell, Greg. Thank you for all the great debates. Episode Transcript... Read more »