Adopting OpenTelemetry for our logging pipeline

Cloudflare’s logging pipeline is one of the largest data pipelines that Cloudflare has, serving millions of log events per second globally, from every server we run. Recently, we undertook a project to migrate the underlying systems of our logging pipeline from syslog-ng to OpenTelemetry Collector and in this post we want to share how we managed to swap out such a significant piece of our infrastructure, why we did it, what went well, what went wrong, and how we plan to improve the pipeline even more going forward.
Background
A full breakdown of our existing infrastructure can be found in our previous post An overview of Cloudflare's logging pipeline, but to quickly summarize here:
- We run a syslog-ng daemon on every server, reading from the local systemd-journald journal, and a set of named pipes.
- We forward those logs to a set of centralized “log-x receivers”, in one of our core data centers.
- We have a dead letter queue destination in another core data center, which receives messages that could not be sent to the primary receiver, and which get mirrored across to the primary receivers when possible.
The goal of this project was to replace those syslog-ng instances as Continue reading