Micah Wylde

Author Archives: Micah Wylde

Announcing the Cloudflare Data Platform: ingest, store, and query your data directly on Cloudflare

For Developer Week in April 2025, we announced the public beta of R2 Data Catalog, a fully managed Apache Iceberg catalog on top of Cloudflare R2 object storage. Today, we are building on that foundation with three launches:

  • Cloudflare Pipelines receives events sent via Workers or HTTP, transforms them with SQL, and ingests them into Iceberg or as files on R2

  • R2 Data Catalog manages the Iceberg metadata and now performs ongoing maintenance, including compaction, to improve query performance

  • R2 SQL is our in-house distributed SQL engine, designed to perform petabyte-scale queries over your data in R2

Together, these products make up the Cloudflare Data Platform, a complete solution for ingesting, storing, and querying analytical data tables.

Like all Cloudflare Developer Platform products, they run on our global compute infrastructure. They’re built around open standards and interoperability. That means that you can bring your own Iceberg query engine — whether that's PyIceberg, DuckDB, or Spark — connect with other platforms like Databricks and Snowflake — and pay no egress fees to access your data.

Analytical data is critical for modern companies. It allows you to understand your user’s behavior, your company’s performance, and alerts you to issues. Continue reading

Just landed: streaming ingestion on Cloudflare with Arroyo and Pipelines

Today, we’re launching the open beta of Pipelines, our streaming ingestion product. Pipelines allows you to ingest high volumes of structured, real-time data, and load it into our object storage service, R2. You don’t have to manage any of the underlying infrastructure, worry about scaling shards or metadata services, and you pay for the data processed (and not by the hour). Anyone on a Workers paid plan can start using it to ingest and batch data — at tens of thousands of requests per second (RPS) — directly into R2.

But this is just the tip of the iceberg: you often want to transform the data you’re ingesting, hydrate it on-the-fly from other sources, and write it to an open table format (such as Apache Iceberg), so that you can efficiently query that data once you’ve landed it in object storage.

The good news is that we’ve thought about that too, and we’re excited to announce that we’ve acquired Arroyo, a cloud-native, distributed stream processing engine, to make that happen.

With Arroyo and our just announced R2 Data Catalog, we’re getting increasingly serious about building a data platform that allows you to ingest data across the planet, store Continue reading