Squeezing the firehose: getting the most from Kafka compression

We at Cloudflare are long time Kafka users, first mentions of it date back to beginning of 2014 when the most recent version was 0.8.0. We use Kafka as a log to power analytics (both HTTP and DNS), DDOS mitigation, logging and metrics.
Firehose CC BY 2.0 image by RSLab
While the idea of unifying abstraction of the log remained the same since then (read this classic blog post from Jay Kreps if you haven't), Kafka evolved in other areas since then. One of these improved areas was compression support. Back in the old days we've tried enabling it a few times and ultimately gave up on the idea because of unresolved issues in the protocol.
Kafka compression overview
Just last year Kafka 0.11.0 came out with the new improved protocol and log format.
The naive approach to compression would be to compress messages in the log individually:

Edit: originally we said this is how Kafka worked before 0.11.0, but that appears to be false.
Compression algorithms work best if they have more data, so in the new log format messages (now called records) are packed back to back and compressed in Continue reading