Squeezing the firehose: getting the most from Kafka compression

Squeezing the firehose: getting the most from Kafka compression

We at Cloudflare are long time Kafka users, first mentions of it date back to beginning of 2014 when the most recent version was 0.8.0. We use Kafka as a log to power analytics (both HTTP and DNS), DDOS mitigation, logging and metrics.

Squeezing the firehose: getting the most from Kafka compressionFirehose CC BY 2.0 image by RSLab

While the idea of unifying abstraction of the log remained the same since then (read this classic blog post from Jay Kreps if you haven't), Kafka evolved in other areas since then. One of these improved areas was compression support. Back in the old days we've tried enabling it a few times and ultimately gave up on the idea because of unresolved issues in the protocol.

Kafka compression overview

Just last year Kafka 0.11.0 came out with the new improved protocol and log format.

The naive approach to compression would be to compress messages in the log individually:

Squeezing the firehose: getting the most from Kafka compression

Edit: originally we said this is how Kafka worked before 0.11.0, but that appears to be false.

Compression algorithms work best if they have more data, so in the new log format messages (now called records) are packed back to back and compressed in Continue reading

Accelerating HPC Investments In Canada

Details about the technologies being used in Canada’s newest and most powerful research supercomputer have been coming out in a piecemeal fashion over the past several months, but now the complete story.

At the SC17 show in November, it was revealed that the HPC system will use Mellanox’s Dragonfly+ network topology and a NVM Express burst buffer fabric from Excelero as key part of a cluster that will offer a peak performance of more than 4.6 petaflops.

Now Lenovo, which last fall won the contract for the Niagara system over 11 other vendors, is unveiling this week that it is

Accelerating HPC Investments In Canada was written by Jeffrey Burt at The Next Platform.

The Week in Internet News: Working Toward a Better Internet

Fixing the Internet: Is the Internet broken? Politico’s EU site looks at the work of the Internet & Jurisdiction Policy Network, which met in Ottawa, Canada, last week to discuss how to fix problems like poor cybersecurity, inaccurate information spread on social media, and other bad behavior. The Internet Society covered the first day of the Ottawa event.

The hills are alive with the sound of broadband: Motherboard has a story about the  Los Angeles Community Broadband Project, which plans to deliver wireless broadband to parts of the city using inexpensive equipment and dish-shaped antennas on hilltops and rooftops.

AI joins the force: The Verge has a long story about a secretive AI-assisted policing effort that started in 2012 as a partnership between the New Orleans Police and Palantir Technologies, a data-mining company founded with seed money from the CIA’s venture capital firm.  The program apparently used AI technologies for predictive policing, a controversial practice used to trace suspects’ ties to other gang members, analyze social media, and predict the likelihood targeted people would commit violence or become a victim. Science Magazine also has a story examining predictive policing.

Women wary of Blockchain bros: The New York Continue reading

Looking Ahead: My 2018 Projects

For the last six years or so, I’ve been publishing a list of projects/goals for the upcoming year (followed by a year-end review of how I did with those projects/goals). For example, here are my goals for 2017, and here’s my year-end review of my progress in 2017. In this post, I’m going to share with you my list of projects/goals for 2018.

As I’ve done in previous years, I’ll list the projects/goals, along with an optional stretch goal (where it makes sense).

  1. Become extremely fluent with Kubernetes. I’m focusing all my technical skills on Kubernetes this year, with the goal of becoming extremely fluent with the project in all its aspects. There are some aspects—like networking, for example—where some specialization/additional focus will be needed (focusing on particular network architectures/plugins). That means “leaving behind” other technologies, like OpenStack, in order to more fully focus on Kubernetes. (Stretch goal: Pass the Certified Kubernetes Administrator [CKA] exam.)

  2. Learn to code/develop in Go. Given that Kubernetes is written in Go and that Go seems to be the language of choice for many new projects, tools, and utilities, I’m going to learn to code/develop in Go in 2018. Because I learned Continue reading

Enterprise HPC Tightens Storage, I/O Strategy Around Support

File system changes in high performance computing, take time. Good file systems are long-lived.  It took several years for some parallel file systems to win out over others and it will be many more decades before the file system as we know it is replaced by something entirely different.

In the meantime, however, there are important points to consider for real-world production HPC deployments that go beyond mere performance comparisons, especially as these workloads grow more complex, become more common, and put new pressures on storage and I/O systems.

The right combination of performance, stability, reliability, and ease of management

Enterprise HPC Tightens Storage, I/O Strategy Around Support was written by Nicole Hemsoth at The Next Platform.

IDG Contributor Network: Pain relief for hospitals managing IoT performance

Nowhere is there more pain for IT staff than in the ever-morphing healthcare market where the Internet of Things (IoT) has been gaining attention and traction.The concept of IoT involves the use of electronic devices that capture or monitor data and are connected via wireless to a private or public cloud, enabling them to automatically trigger certain events. In the healthcare context, a growing set of IoT devices have been introduced to patients and medical staff in various forms. Whether wireless bedside monitors, infusion pumps, or even voice/data-based clinician communication devices, the result is means better and more efficient patient care.To read this article in full, please click here

Hybrid cloud: How organizations are using Microsoft’s on-premises cloud platform

Microsoft’s on-premise Azure cloud platform, Azure Stack, has now been embedded in real-world, core business environments with early adopters validating business use cases that require secured and host environments.  Here are some of the current uses of Azure Stack that are deployed in enterprises.To read this article in full, please click here(Insider Story)

How organizations are using Microsoft’s on-premises cloud platform

Microsoft’s on-premise Azure cloud platform, Azure Stack, has now been embedded in real-world, core business environments with early adopters validating business use cases that require secured and host environments.  Here are some of the current uses of Azure Stack that are deployed in enterprises.+RELATED: What IT pros need to know about Azure Stack After virtualization and cloud, what's left on premises? How to prevent a bad case of cloud-buyer's remorse Azure Stack in healthcare Healthcare organizations have been a prime candidate for Azure Stack as they fit the model of having large (extremely large!) sets of data and customers, and also face regulatory policies and protection aimed at securing the data being transacted.  Azure Stack fits the mold of providing healthcare organizations the cloud-scale that they wish to achieve, in a protected, managed and secured environment.To read this article in full, please click here

How organizations are using Microsoft’s on-premises cloud platform

Microsoft’s on-premise Azure cloud platform, Azure Stack, has now been embedded in real-world, core business environments with early adopters validating business use cases that require secured and host environments.  Here are some of the current uses of Azure Stack that are deployed in enterprises.+RELATED: What IT pros need to know about Azure Stack After virtualization and cloud, what's left on premises? How to prevent a bad case of cloud-buyer's remorse Azure Stack in healthcare Healthcare organizations have been a prime candidate for Azure Stack as they fit the model of having large (extremely large!) sets of data and customers, and also face regulatory policies and protection aimed at securing the data being transacted.  Azure Stack fits the mold of providing healthcare organizations the cloud-scale that they wish to achieve, in a protected, managed and secured environment.To read this article in full, please click here