Nicole Hemsoth

Author Archives: Nicole Hemsoth

Some Surprises in the 2018 DoE Budget for Supercomputing

The US Department of Energy fiscal year 2018 budget request is in. While it reflects much of what we might expect in pre-approval format in terms of forthcoming supercomputers in particular, there are some elements that strike us as noteworthy.

In the just-released 2018 FY budget request from Advanced Scientific Computing Research (ASCR), page eight of the document states that “The Argonne Leadership Computing Facility will operate Mira (at 10 petaflops) and Theta (at 8.5 petaflops) for existing users, while turning focus to site preparations for deployment of an exascale system of novel architecture.”

Notice anything missing in this description?

Some Surprises in the 2018 DoE Budget for Supercomputing was written by Nicole Hemsoth at The Next Platform.

FPGA Startup Gathers Funding Force for Merged Hyperscale Inference

Around this time last year, we delved into a new FPGA-based architecture that targeted efficient, scalable machine learning inference from startup DeePhi Tech. The company just rounded out its first funding effort with an undisclosed sum with major investors, including Banyan Capital and as we learned this week, FPGA maker Xilinx.

As that initial article details, the Stanford and Tsinghua University-fed research focused on network pruning and compression at low precision with a device that could be structured for low latency and custom memory allocations. These efforts were originally built on Xilinx FPGA hardware and given this first round of

FPGA Startup Gathers Funding Force for Merged Hyperscale Inference was written by Nicole Hemsoth at The Next Platform.

Singularity is the Hinge To Swing HPC Cloud Adoption

For almost a decade now, the cloud has been pitched as a cost-effective way to bring supercomputing out of the queue and into public IaaS or HPC on-demand environments. While there are certainly many use cases to prove that tightly-coupled problems can still work in the cloud despite latency hits (among other issues), application portability is one sticking point.

For instance, let’s say you have developed a financial modeling application on an HPC on demand service to prove that the model works so you can make the case for purchasing a large cluster to run it at scale on-prem. This

Singularity is the Hinge To Swing HPC Cloud Adoption was written by Nicole Hemsoth at The Next Platform.

First In-Depth Look at Google’s New Second-Generation TPU

It was only just last month that we spoke with Google distinguished hardware engineer, Norman Jouppi, in depth about the tensor processing unit used internally at the search giant to accelerate deep learning inference, but that device—that first TPU—is already appearing rather out of fashion.

This morning at the Google’s I/O event, the company stole Nvidia’s recent Volta GPU thunder by releasing details about its second-generation tensor processing unit (TPU), which will manage both training and inference in a rather staggering 180 teraflops system board, complete with custom network to lash several together into “TPU pods” that can deliver Top

First In-Depth Look at Google’s New Second-Generation TPU was written by Nicole Hemsoth at The Next Platform.

Cray Supercomputing as a Service Becomes a Reality

For a mature company that kickstarted supercomputing as we know it, Cray has done a rather impressive job of reinventing itself over the years.

From its original vector machines, to HPC clusters with proprietary interconnects and custom software stacks, to graph analytics appliances engineered in-house, and now to machine learning, the company tends not to let trends in computing slip by without a new machine.

However, all of this engineering and tuning comes at a cost—something that, arguably, has kept Cray at bay when it comes to reaching the new markets that sprung up in the “big data” days of

Cray Supercomputing as a Service Becomes a Reality was written by Nicole Hemsoth at The Next Platform.

When Will AI Replace Traditional Supercomputing Simulations?

The science fiction of a generation ago predicted a future in which humans were replaced by the reasoning might of a supercomputer. But in an unexpected twist of events, it appears the it is the supercomputer’s main output—scientific simulations—that could be replaced by an even higher order of intelligence.

While we will always need supercomputing hardware, the vast field of scientific computing, or high performance computing, could also be in the crosshairs for disruptive change, altering the future prospects for scientific code developers, but opening new doors in more energy-efficient, finer-grained scientific discovery. With code that can write itself based

When Will AI Replace Traditional Supercomputing Simulations? was written by Nicole Hemsoth at The Next Platform.

The Year Ahead for GPU Accelerated Supercomputing

GPU computing has deep roots in supercomputing, but Nvidia is using that springboard to dive head first into the future of deep learning.

This changes the outward-facing focus of the company’s Tesla business from high-end supers to machine learning systems with the expectation that those two formerly distinct areas will find new ways to merge together given the similarity in machine, scalability, and performance requirements. This is not to say that Nvidia is failing the HPC set, but there is a shift in attention from what GPUs can do for Top 500 class machines to what graphics processors can do

The Year Ahead for GPU Accelerated Supercomputing was written by Nicole Hemsoth at The Next Platform.

HPC to Deep Learning from an Asian Perspective

Big data, data science, machine learning, and now deep learning are all the rage and have tons of hype, for better—and in some ways, for worse. Advancements in AI such as language understanding, self-driving cars, automated claims, legal text processing, and even automated medical diagnostics are already here or will be here soon.

In Asia, several countries have made significant advancements and investments into AI, leveraging their historical work in HPC.

China now owns the top three positions in the Top500 with Sunway TaihuLight, Tianhe-2, and Tianhe, and while Tianhe-2 and Tianhe were designed for HPC style workloads, TaihuLight is

HPC to Deep Learning from an Asian Perspective was written by Nicole Hemsoth at The Next Platform.

A Dive into Deep Learning Chip Startup Graphcore’s Software Stack

The last two years have delivered a new wave of deep learning architectures designed specifically for tackling both training and inference sides of neural networks. We have covered many of them extensively, but only a few have seen major investment or acquisition—the most notable of which was Nervana Systems over a year ago.

Among the string of neural network chip startups, Graphcore stood out with its manycore approach to handling both training and inference on the same manycore chip. We described the hardware architecture in detail back in March and while its over $30 million in funding from a

A Dive into Deep Learning Chip Startup Graphcore’s Software Stack was written by Nicole Hemsoth at The Next Platform.

Generalizing a Hardware, Software Platform for Industrial AI

Industrial companies have replaced people with machines, systems analysts with simulations, and now the simulations themselves could be outpaced by machine learning—albeit with a human in the loop, at the beginning at least.

The new holy grail of machine learning and deep learning, as with almost any other emerging technology set, is to mask enough of the complexity to make it broadly applicable without lose the performance and other features that can be retained by taking a low-level approach. If this kind of deep generalization can happen, a new mode of considering how data is used in research and enterprise

Generalizing a Hardware, Software Platform for Industrial AI was written by Nicole Hemsoth at The Next Platform.

An Inside Look at One Major Media Outlet’s Cloud Transition

When it comes to large media in the U.S. with a broad reach into television and digital, the Scripps Networks Interactive brand might not come to mind first, but many of the channels and sources are household names, including HGTV, Food Network, and The Travel Channel, among others.

Delivering television and web-based content and services is a data and computationally intensive task, which just over five years ago was handled by on-premises machines in the company’s two local datacenters. In order to keep up with peaks in demand during popular events or programs, Scripps Interactive had to overprovision with those

An Inside Look at One Major Media Outlet’s Cloud Transition was written by Nicole Hemsoth at The Next Platform.

Cluster Management for Distributed Machine Learning at Scale

Over the last couple of decades, those looking for a cluster management platform faced no shortage of choices. However, large-scale clusters are being asked to operate in different ways, namely by chewing on large-scale deep learning workloads—and this requires a specialized approach to get high utilization, efficiency, and performance.

Nearly all of the cluster management tools from the high performance computing community are being bent in the machine learning direction, but for production deep learning shops, there appears to be a DIY tendency. This is not as complicated as it might sound, given the range of container-based open source tools,

Cluster Management for Distributed Machine Learning at Scale was written by Nicole Hemsoth at The Next Platform.

The Next Battleground for Deep Learning Performance

The frameworks are in place, the hardware infrastructure is robust, but what has been keeping machine learning performance at bay has far less to do with the system-level capabilities and more to do with intense model optimization.

While it might not be the sexy story that generates the unending wave of headlines around deep learning, hyperparameter tuning is a big barrier when it comes to new leaps in deep learning performance. In more traditional machine learning, there are plenty of open sources tools for this, but where it is needed most is in deep learning—an area that does appear to

The Next Battleground for Deep Learning Performance was written by Nicole Hemsoth at The Next Platform.

A Trillion Edge Graph on a Single Commodity Node

Efficiently and quickly chewing through one trillion edges of a complex graph is no longer in itself a standalone achievement, but doing so on a single node, albeit with some acceleration and ultra-fast storage, is definitely worth noting.

There are many paths to processing trillions of edges efficiently and with high performance as demonstrated by companies like Facebook with its distributed trillion-edge scaling effort across 200 nodes in 2015 and Microsoft with a similar feat as well.

However, these approaches all required larger clusters; something that comes with obvious cost but over the course of scaling across nodes, latency as

A Trillion Edge Graph on a Single Commodity Node was written by Nicole Hemsoth at The Next Platform.

Escher Erases Batching Lines for Efficient FPGA Deep Learning

Aside from the massive parallelism available in modern FPGAs, there are other two other key reasons why reconfigurable hardware is finding a fit in neural network processing in both training and inference.

First is the energy efficiency of these devices relative to performance, and second is the flexibility of an architecture that can be recast to the framework at hand. In the past we’ve described how FPGAs can fit over GPUs as well as custom ASICs in some cases, and what the future might hold for novel architectures based on reconfigurable hardware for these workloads. But there is still

Escher Erases Batching Lines for Efficient FPGA Deep Learning was written by Nicole Hemsoth at The Next Platform.

Taking the Heavy Lifting Out of TensorFlow at Extreme Scale

There is no real middle ground when it comes to TensorFlow use cases. Most implementations take place either in a single node or at the drastic Google-scale, with few scalability stories in between.

This is starting to change, however, as more users find an increasing array of open source tools based on MPI and other approaches to hop to multi-GPU scalability for training, but it still not simple to scale Google’s own framework across larger machines. Code modifications get hairy beyond single node and for the MPI uninitiated, there is a steep curve to scalable deep learning.

Although high performance

Taking the Heavy Lifting Out of TensorFlow at Extreme Scale was written by Nicole Hemsoth at The Next Platform.

Parallel Programming Approaches for Accelerated Systems Compared

In high performance computing, machine learning, and a growing set of other application areas, accelerated, heterogeneous systems are becoming the norm.

With that state come several parallel programming approaches; from OpenMP, OpenACC, OpenCL, CUDA, and others. The trick is choosing the right framework for maximum performance and efficiency—but also productivity.

There have been several studies comparing relative performance between the various frameworks over the last several years, but many take two head to head for compares on a single benchmark or application. A team from Linneaus University in Sweden took these comparisons a step further by developing a custom tool

Parallel Programming Approaches for Accelerated Systems Compared was written by Nicole Hemsoth at The Next Platform.

China Pushes Breadth-First Search Across Ten Million Cores

There is increasing interplay between the worlds of machine learning and high performance computing (HPC). This began with a shared hardware and software story since many supercomputing tricks of the trade play well into deep learning, but as we look to next generation machines, the bond keeps tightening.

Many supercomputing sites are figuring out how to work deep learning into their existing workflows, either as a pre- or post-processing step, while some research areas might do away with traditional supercomputing simulations altogether eventually. While these massive machines were designed with simulations in mind, the strongest supers have architectures that parallel

China Pushes Breadth-First Search Across Ten Million Cores was written by Nicole Hemsoth at The Next Platform.

Supercomputing Gets Neural Network Boost in Quantum Chemistry

Just two years ago, supercomputing was thrust into a larger spotlight because of the surge of interest in deep learning. As we talked about here, the hardware similarities, particularly for training on GPU-accelerated machines and key HPC development approaches, including MPI to scale across a massive number of nodes, brought new attention to the world of scientific and technical computing.

What wasn’t clear then was how traditional supercomputing could benefit from all the framework developments in deep learning. After all, they had many of the same hardware environments and problems that could benefit from prediction, but what they lacked

Supercomputing Gets Neural Network Boost in Quantum Chemistry was written by Nicole Hemsoth at The Next Platform.

A Look at Facebook’s Interactive Neural Network Visualization System

There has been much discussion about the “black box” problem of neural networks. Sophisticated models can perform well on predictive workloads, but when it comes to backtracking how the system came to its end result, there is no clear way to understand what went right or wrong—or how the model turned on itself to arrive a conclusion.

For old-school machine learning models, this was not quite the problem it is now with non-linear, hidden data structures and countless parameters. For researchers deploying neural networks for scientific applications, this lack of reproducibility from the black box presents validation hurdles, but for

A Look at Facebook’s Interactive Neural Network Visualization System was written by Nicole Hemsoth at The Next Platform.

1 23 24 25 26 27 35