Tesla GPU Accelerator Bang For The Buck, Kepler To Volta

If you are running applications in the HPC or AI realms, you might be in for some sticker shock when you shop for GPU accelerators – thanks in part to the growing demand of Nvidia’s Tesla cards in those markets but also because cryptocurrency miners who can’t afford to etch their own ASICs are creating a huge demand for the company’s top-end GPUs.

Nvidia does not provide list prices or suggested street prices for its Tesla line of GPU accelerator cards, so it is somewhat more problematic to try to get a handle on the bang for the buck over

Tesla GPU Accelerator Bang For The Buck, Kepler To Volta was written by Timothy Prickett Morgan at The Next Platform.

Responding to Readers: Questions on Microloops

Two different readers, in two different forums, asked me some excellent questions about some older posts on mircoloops. Unfortunately I didn’t take down the names or forums when I noted the questions, but you know who you are! For this discussion, use the network show below.

In this network, assume all link costs are one, and the destination is the 100::/64 Ipv6 address connected to A at the top. To review, a microloop will form in this network when the A->B link fails:

  1. B will learn about the link failure
  2. B will send an updated router LSP or LSA towards D, with the A->B link removed
  3. At about the same time, B will recalculate its best path to 100::/64, so its routing and forwarding tables now point towards D as the best path
  4. D, in the meantime, receives the updated information, runs SPF, and installs the new routing information into its forwarding table, with the new path pointing towards E

Between the third and fourth steps, B will be using D as its best path, while D is using B as its best path. Hence the microloop. The first question about microloops was—

Would BFD help prevent the microloop (or Continue reading

Squeezing the firehose: getting the most from Kafka compression

Squeezing the firehose: getting the most from Kafka compression

We at Cloudflare are long time Kafka users, first mentions of it date back to beginning of 2014 when the most recent version was 0.8.0. We use Kafka as a log to power analytics (both HTTP and DNS), DDOS mitigation, logging and metrics.

Squeezing the firehose: getting the most from Kafka compressionFirehose CC BY 2.0 image by RSLab

While the idea of unifying abstraction of the log remained the same since then (read this classic blog post from Jay Kreps if you haven't), Kafka evolved in other areas since then. One of these improved areas was compression support. Back in the old days we've tried enabling it a few times and ultimately gave up on the idea because of unresolved issues in the protocol.

Kafka compression overview

Just last year Kafka 0.11.0 came out with the new improved protocol and log format.

The naive approach to compression would be to compress messages in the log individually:

Squeezing the firehose: getting the most from Kafka compression

Edit: originally we said this is how Kafka worked before 0.11.0, but that appears to be false.

Compression algorithms work best if they have more data, so in the new log format messages (now called records) are packed back to back and compressed in Continue reading

Accelerating HPC Investments In Canada

Details about the technologies being used in Canada’s newest and most powerful research supercomputer have been coming out in a piecemeal fashion over the past several months, but now the complete story.

At the SC17 show in November, it was revealed that the HPC system will use Mellanox’s Dragonfly+ network topology and a NVM Express burst buffer fabric from Excelero as key part of a cluster that will offer a peak performance of more than 4.6 petaflops.

Now Lenovo, which last fall won the contract for the Niagara system over 11 other vendors, is unveiling this week that it is

Accelerating HPC Investments In Canada was written by Jeffrey Burt at The Next Platform.

The Week in Internet News: Working Toward a Better Internet

Fixing the Internet: Is the Internet broken? Politico’s EU site looks at the work of the Internet & Jurisdiction Policy Network, which met in Ottawa, Canada, last week to discuss how to fix problems like poor cybersecurity, inaccurate information spread on social media, and other bad behavior. The Internet Society covered the first day of the Ottawa event.

The hills are alive with the sound of broadband: Motherboard has a story about the  Los Angeles Community Broadband Project, which plans to deliver wireless broadband to parts of the city using inexpensive equipment and dish-shaped antennas on hilltops and rooftops.

AI joins the force: The Verge has a long story about a secretive AI-assisted policing effort that started in 2012 as a partnership between the New Orleans Police and Palantir Technologies, a data-mining company founded with seed money from the CIA’s venture capital firm.  The program apparently used AI technologies for predictive policing, a controversial practice used to trace suspects’ ties to other gang members, analyze social media, and predict the likelihood targeted people would commit violence or become a victim. Science Magazine also has a story examining predictive policing.

Women wary of Blockchain bros: The New York Continue reading

Looking Ahead: My 2018 Projects

For the last six years or so, I’ve been publishing a list of projects/goals for the upcoming year (followed by a year-end review of how I did with those projects/goals). For example, here are my goals for 2017, and here’s my year-end review of my progress in 2017. In this post, I’m going to share with you my list of projects/goals for 2018.

As I’ve done in previous years, I’ll list the projects/goals, along with an optional stretch goal (where it makes sense).

  1. Become extremely fluent with Kubernetes. I’m focusing all my technical skills on Kubernetes this year, with the goal of becoming extremely fluent with the project in all its aspects. There are some aspects—like networking, for example—where some specialization/additional focus will be needed (focusing on particular network architectures/plugins). That means “leaving behind” other technologies, like OpenStack, in order to more fully focus on Kubernetes. (Stretch goal: Pass the Certified Kubernetes Administrator [CKA] exam.)

  2. Learn to code/develop in Go. Given that Kubernetes is written in Go and that Go seems to be the language of choice for many new projects, tools, and utilities, I’m going to learn to code/develop in Go in 2018. Because I learned Continue reading

Enterprise HPC Tightens Storage, I/O Strategy Around Support

File system changes in high performance computing, take time. Good file systems are long-lived.  It took several years for some parallel file systems to win out over others and it will be many more decades before the file system as we know it is replaced by something entirely different.

In the meantime, however, there are important points to consider for real-world production HPC deployments that go beyond mere performance comparisons, especially as these workloads grow more complex, become more common, and put new pressures on storage and I/O systems.

The right combination of performance, stability, reliability, and ease of management

Enterprise HPC Tightens Storage, I/O Strategy Around Support was written by Nicole Hemsoth at The Next Platform.