Cerebras Trains Llama Models To Leap Over GPUs

It was only a few months ago when waferscale compute pioneer Cerebras Systems was bragging that a handful of its WSE-3 engines lashed together could run circles around Nvidia GPU instances based on Nvidia’s “Hopper” H100 GPUs when running the open source Llama 3.1 foundation model created by Meta Platforms.

Cerebras Trains Llama Models To Leap Over GPUs was written by Timothy Prickett Morgan at The Next Platform.

AI Should Be Concise

One of the things that I’ve noticed about the rise of AI is that everything feels so wordy now. I’m sure it’s a byproduct of the popularity of ChatGPT and other LLMs that are designed for language. You’ve likely seen it too on websites that have paragraphs of text that feel unnecessary. Maybe you’re looking for an answer to a specific question. You could be trying to find a recipe or even a code block for a problem. What you find is a wall of text that feels pieced together by someone that doesn’t know how to write.

The Soul of Wit

I feel like the biggest issue with those overly word-filled answers comes down to the way that people feel about unnecessary exposition. AI is built to write things on a topic and fill out word count. Much like a student trying to pad out the page length for a required report, AI doesn’t know when to shut up. It specifically adds words that aren’t really required. I realize that there are modes of AI content creation that value being concise but those are the default.

I use AI quite a bit to summarize long articles, many of which Continue reading

Elephants in tunnels: how Hyperdrive connects to databases inside your VPC networks

With September’s announcement of Hyperdrive’s ability to send database traffic from Workers over Cloudflare Tunnels, we wanted to dive into the details of what it took to make this happen.

Hyper-who?

Accessing your data from anywhere in Region Earth can be hard. Traditional databases are powerful, familiar, and feature-rich, but your users can be thousands of miles away from your database. This can cause slower connection startup times, slower queries, and connection exhaustion as everything takes longer to accomplish.

Cloudflare Workers is an incredibly lightweight runtime, which enables our customers to deploy their applications globally by default and renders the cold start problem almost irrelevant. The trade-off for these light, ephemeral execution contexts is the lack of persistence for things like database connections. Database connections are also notoriously expensive to spin up, with many round trips required between client and server before any query or result bytes can be exchanged.

Hyperdrive is designed to make the centralized databases you already have feel like they’re global while keeping connections to those databases hot. We use our global network to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users Continue reading

IBM’s Red Hat Acquisition Will Pay For Itself By Early Next Year

Big Blue has reported its financial results for the third quarter of 2024, and the thing that stands out most is not the System z16 mainframes and Power10 midrange and big iron are getting a little long in the tooth ahead of new product cycles expected to start in 2025.

IBM’s Red Hat Acquisition Will Pay For Itself By Early Next Year was written by Timothy Prickett Morgan at The Next Platform.

TL006: From Blame to Empowerment: Changing Team Culture in Tech

Leadership has a huge impact on an organization’s culture, including tech teams. On the positive side, leaders can foster healthy, productive environments. On the negative side, they can build hostile, blame-centric viper pits. Today’s episode of Technically Leadership examines how to strive for the positive by shielding teams from internal politics, developing empathetic leadership, distinguishing... Read more »

Build durable applications on Cloudflare Workers: you write the Workflows, we take care of the rest

Workflows, Cloudflare’s durable execution engine that allows you to build reliable, repeatable multi-step applications that scale for you, is now in open beta. Any developer with a free or paid Workers plan can build and deploy a Workflow right now: no waitlist, no sign-up form, no fake line around-the-block.

If you learn by doing, you can create your first Workflow via a single command (or visit the docs for the full guide):

npm create cloudflare@latest workflows-starter -- \
  --template "cloudflare/workflows-starter"

Open the src/index.ts file, poke around, start extending it, and deploy it with a quick wrangler deploy.

If you want to learn more about how Workflows works, how you can use it to build applications, and how we built it, read on.

Workflows? Durable Execution?

Workflows—which we announced back during Developer Week earlier this year—is our take on the concept of “Durable Execution”: the ability to build and execute applications that are durable in the face of errors, network issues, upstream API outages, rate limits, and (most importantly) infrastructure failure.

As over 2.4 million developers continue to build applications on top of Cloudflare Workers, R2, and Workers AI, we’ve noticed more developers building multi-step applications and workflows Continue reading

Billions and billions (of logs): scaling AI Gateway with the Cloudflare Developer Platform

With the rapid advancements occurring in the AI space, developers face significant challenges in keeping up with the ever-changing landscape. New models and providers are continuously emerging, and understandably, developers want to experiment and test these options to find the best fit for their use cases. This creates the need for a streamlined approach to managing multiple models and providers, as well as a centralized platform to efficiently monitor usage, implement controls, and gather data for optimization.

AI Gateway is specifically designed to address these pain points. Since its launch in September 2023, AI Gateway has empowered developers and organizations by successfully proxying over 2 billion requests in just one year, as we highlighted during September’s Birthday Week. With AI Gateway, developers can easily store, analyze, and optimize their AI inference requests and responses in real time.

With our initial architecture, AI Gateway faced a significant challenge: the logs, those critical trails of data interactions between applications and AI models, could only be retained for 30 minutes. This limitation was not just a minor inconvenience; it posed a substantial barrier for developers and businesses needing to analyze long-term patterns, ensure compliance, or simply debug over more extended periods.

In Continue reading

Durable Objects aren’t just durable, they’re fast: a 10x speedup for Cloudflare Queues

Cloudflare Queues let a developer decouple their Workers into event-driven services. Producer Workers write events to a Queue, and consumer Workers are invoked to take actions on the events. For example, you can use a Queue to decouple an e-commerce website from a service which sends purchase confirmation emails to users. During 2024’s Birthday Week, we announced that Cloudflare Queues is now Generally Available, with significant performance improvements that enable larger workloads. To accomplish this, we switched to a new architecture for Queues that enabled the following improvements:

  • Median latency for sending messages has dropped from ~200ms to ~60ms

  • Maximum throughput for each Queue has increased over 10x, from 400 to 5000 messages per second

  • Maximum Consumer concurrency for each Queue has increased from 20 to 250 concurrent invocations

Median latency drops from ~200ms to ~60ms as Queues are migrated to the new architecture

In this blog post, we'll share details about how we built Queues using Durable Objects and the Cloudflare Developer Platform, and how we migrated from an initial Beta architecture to a geographically-distributed, horizontally-scalable architecture for General Availability.

v1 Beta architecture

When initially designing Cloudflare Queues, we decided to build something simple that we could get Continue reading

How Does Netlab Deal with Server Reboots?

Now and then, someone asks how netlab deals with reboots (or power failures or crashes) of the server it’s running on.

TL&DR: It doesn’t. However…

netlab is a CLI command that acts as an umbrella orchestration layer for Vagrant and Containerlab. It does not run as a cron job, init script, or service and thus cannot be invoked when a server is booted.

NAN077: Network Observability: Tools, Automation, and Insights

Network optimization starts with observing, but how are networks observed and what tools are used? Joining the podcast today are the authors behind the book “Modern Network Observability.” Eric Chou welcomes David Flores, Christian Adell, and Josh VanDeraa to help uncover practical strategies and real-world case studies for network observability. Episode Guests: David Flores, Christian... Read more »

D2DO254: Intelligent Data Infrastructure: How NetApp Builds for Performance, Security, and Visibility across Clouds (Sponsored)

Enterprise data is everywhere: on prem, at the edge, and across the public cloud. Storing and managing that data involves more than just SSDs and spinning disks; it’s about building and operating a data infrastructure. On today’s Day Two DevOps podcast, host Ned Bellavance explores the idea of data infrastructure with Jeff Baxter, VP of... Read more »

Management Toolkit: SMART Goals

In the last post we discussed Management vs Leadership, now let’s start getting into some useful tools, traits, and behaviors for both managers and leaders. Starting us off will be SMART goals. SMART is an acronym for Specific Measurable Achievable Realistic and Time-based. Once you hear about SMART goals and what SMART goals represent it […]

Training a million models per day to save customers of all sizes from DDoS attacks

Our always-on DDoS protection runs inside every server across our global network.  It constantly analyzes incoming traffic, looking for signals associated with previously identified DDoS attacks. We dynamically create fingerprints to flag malicious traffic, which is dropped when detected in high enough volume — so it never reaches its destination — keeping customer websites online.

In many cases, flagging bad traffic can be straightforward. For example, if we see too many requests to a destination with the same protocol violation, we can be fairly sure this is an automated script, rather than a surge of requests from a legitimate web browser.

Our DDoS systems are great at detecting attacks, but there’s a minor catch. Much like the human immune system, they are great at spotting attacks similar to things they have seen before. But for new and novel threats, they need a little help knowing what to look for, which is an expensive and time-consuming human endeavor.

Cloudflare protects millions of Internet properties, and we serve over 60 million HTTP requests per second on average, so trying to find unmitigated attacks in such a huge volume of traffic is a daunting task. In order to protect the smallest of companies, Continue reading

4.2 Tbps of bad packets and a whole lot more: Cloudflare’s Q3 DDoS report

Welcome to the 19th edition of the Cloudflare DDoS Threat Report. Released quarterly, these reports provide an in-depth analysis of the DDoS threat landscape as observed across the Cloudflare network. This edition focuses on the third quarter of 2024.

With a 296 Terabit per second (Tbps) network located in over 330 cities worldwide, Cloudflare is used as a reverse proxy by nearly 20% of all websites. Cloudflare holds a unique vantage point to provide valuable insights and trends to the broader Internet community.

Key insights 

  • The number of DDoS attacks spiked in the third quarter of 2024. Cloudflare mitigated nearly 6 million DDoS attacks, representing a 49% increase QoQ and 55% increase YoY.

  • Out of those 6 million, Cloudflare’s autonomous DDoS defense systems detected and mitigated over 200 hyper-volumetric DDoS attacks exceeding rates of 3 terabits per second (Tbps) and 2 billion packets per second (Bpps). The largest attack peaked at 4.2 Tbps and lasted just a minute.

  • The Banking & Financial Services industry was subjected to the most DDoS attacks. China was the country most targeted by DDoS attacks, and Indonesia was the largest source of DDoS attacks.

To learn more about DDoS attacks and other types Continue reading

Introducing Access for Infrastructure: SSH

BastionZero joined Cloudflare in May 2024. We are thrilled to announce Access for Infrastructure as BastionZero’s native integration into our SASE platform, Cloudflare One. Access for Infrastructure will enable organizations to apply Zero Trust controls in front of their servers, databases, network devices, Kubernetes clusters, and more. Today, we’re announcing short-lived SSH access as the first available feature. Over the coming months we will announce support for other popular infrastructure access target types like Remote Desktop Protocol (RDP), Kubernetes, and databases.

Applying Zero Trust principles to infrastructure

Organizations have embraced Zero Trust initiatives that modernize secure access to web applications and networks, but often the strategies they use to manage privileged access to their infrastructure can be siloed, overcomplicated, or ineffective. When we speak to customers about their infrastructure access solution, we see common themes and pain points:

  • Too risky: Long-lived credentials and shared keys get passed around and inflate the risk of compromise, excessive permissions, and lateral movement

  • Too clunky: Manual credential rotations and poor visibility into infrastructure access slow down incident response and compliance efforts

Some organizations have dealt with the problem of privileged access to their infrastructure by purchasing a Privileged Access Management (PAM) solution Continue reading

Per-Prefix and Per-VRF MPLS/VPN and EVPN Labels/VNIs

Long long time ago1, in an ancient town far far away2, an old-school networking Jeddi3 was driving us toward a convent4 where we had an SDN workshop5. While we were stuck in the morning traffic jam, an enthusiastic engineer sitting beside me wanted to know my opinion about per-prefix and per-VRF MPLS/VPN label allocation.

At that time, I had lived in a comfortable Cisco IOS bubble for way too long, so my answer was along the lines of “Say what???” Nicola Modena6 quickly expanded my horizons, and I said, “Gee, I have to write a blog post about that!” As you can see, it took me over a decade.

PP036: News Roundup – NIST Nixes Password Resets, Cargo Crane Espionage Risks, Municipal Govs Targeted, and More

Today’s Packet Protector rounds up recent security news, including revised password guidelines from NIST, a White House push to help fill infosec jobs, and potential espionage risks from Chinese-made cranes being used at US ports. We also cover a hospital data breach that leaked nude patient photos, discuss why municipal governments are rich targets for... Read more »
1 5 6 7 8 9 3,697