Author Archives: Nicole Hemsoth
Author Archives: Nicole Hemsoth
The more things change, the more they stay the same.
While exascale supercomputers mark a next step in performance capability, at the broader architectural level, the innovations that go into such machines will be the result of incremental improvements to the same components that have existed on HPC systems for several years.
In large-scale supercomputing, many performance trends have jacked up capability and capacity—but the bottlenecks have not changed since the dawn of computing as we know it. Memory latency and memory bandwidth remain the gating factors to how fast, efficiently, and reliably big sites can run—and there is still …
An Exascale Timeline for Storage and I/O Systems was written by Nicole Hemsoth at The Next Platform.
It would be surprising to find a Hadoop shop that builds a cluster based on the high-end 68+ core Intel Knights Landing processors—not just because of the sheer horsepower (read as “expense”) for workloads that are more data-intensive versus compute-heavy, but also because of a mismatch between software and file system elements.
Despite these roadblocks, work has been underway at Intel’s behest to prime Knights Landing clusters for beefier Hadoop/MapReduce and machine learning jobs at one of its parallel computing centers at Indiana University.
According to Judy Qiu, associate professor of intelligent systems engineering in IU’s computing division, it is …
Hadoop Platform Raised to Knights Landing Height was written by Nicole Hemsoth at The Next Platform.
If anything has become clear over the last several years of watching infrastructure and application trends among SaaS-businesses, it is that nothing is as simple as it seems. Even relatively straightforward services, like transactional email processing, have some hidden layers of complexity, which tends to equal cost.
For most businesses providing web-based services, the solution for complexity was found by offloading infrastructure concerns to the public cloud. This provided geographic availability, pricing flexibility, and development agility, but not all web companies went the cloud route out of the gate. Consider SendGrid, which pushes out over 30 billion emails per month. …
When Agility Outweighs Cost for Big Cloud Operations was written by Nicole Hemsoth at The Next Platform.
The software ecosystem in high performance computing is set to be more complex with the leaps in capability coming with next generation exascale systems. Among several challenges is making sure that applications retain their performance as they scale to higher core counts and accelerator-rich systems.
Software development and performance profiling company, Allinea, which has been around for almost two decades in HPC, was recently acquired by ARM to add to the company’s software ecosystem story. We talked with one of the early employees of Allinea, VP of Product Development, Mark O’Connor about what has come before—and what the software performance …
Performance Portability on the Road to Exascale was written by Nicole Hemsoth at The Next Platform.
While it is not likely we will see large supercomputers on the International Space Station (ISS) anytime soon, HPE is getting a head start on providing more advanced on-board computing capabilities via a pair of its aptly-named “Apollo” water-cooled servers in orbit.
The two-socket machines, connected with Infiniband will put Broadwell computing capabilities on the ISS, mostly running benchmarks, including High Performance Linpack (HPL), the metric that determines the Top 500 supercomputer rankings. These tests, in addition to the more data movement-centric HPCG benchmark and NASA’s own NAS parallel benchmark will determine what performance changes, if any, are to be …
One Small Step Toward Supercomputers in Space was written by Nicole Hemsoth at The Next Platform.
In the following interview, Dr. Matt Leininger, Deputy for Advanced Technology Projects at Lawrence Livermore National Laboratory (LLNL), one of the National Nuclear Security Administration’s (NNSA) Tri Labs describes how scientists at the Tri Labs—LLNL, Los Alamos National Laboratory (LANL), and Sandia National Laboratories (SNL)—carry out the work of certifying America’s nuclear stockpile through computational science and focused above-ground experiments.
We spoke with Dr. Leininger about some of the workflow that Tri Labs scientists follow, how the Commodity Technology Systems clusters are used in their research, and how machine learning is helping them.
The overall goal is to demonstrate a …
A Look Inside U.S. Nuclear Security’s Commodity Technology Systems was written by Nicole Hemsoth at The Next Platform.
The golden grail of deep learning has two handles. On the one hand, developing and scaling systems that can train ever-growing model sizes is one concern. And on the other side, cutting down inference latencies while preserving accuracy of trained models is another issue.
Being able to do both on the same system represents its own host of challenges, but for one group at IBM Research, focusing on the compute-intensive training element will have a performance and efficiency trickle-down effect that speed the entire deep learning workflow—from training to inference. This work, which is being led at the T.J. Watson …
IBM Highlights PowerAI, OpenPower System Scalability was written by Nicole Hemsoth at The Next Platform.
Google has been at the bleeding edge of AI hardware development with the arrival of its TPU and other system-scale modifications to make large-scale neural network processing efficient and fast.
But just as these developments come to fruition, advances in trimmed-down deep learning could move many more machine learning training and inference operations out of the datacenter and into your palm.
Although it might be natural to think the reason that neural networks cannot be processed on devices like smartphones is because of limited CPU power, the real challenge lies in the vastness of the model sizes and hardware memory …
Google Research Pushing Neural Networks Out of the Datacenter was written by Nicole Hemsoth at The Next Platform.
Novel architectures are born out of necessity and for some applications, including molecular dynamics, there have been endless attempts to push parallel performance.
In this area, there are already numerous approaches to acceleration. At the highest end is the custom ASIC-driven Anton machine from D.E. Shaw, which is the fastest system, but certainly not the cheapest. On the more accessible accelerators side are Tesla GPUs for accelerating highly parallel parts of the workload—and increasingly, FPGAs are being considered for boosting the performance of major molecular dynamics applications, most notably GROMACS as well as general purpose, high-end CPUs (Knights Landing …
A MapReduce Accelerator to Tackle Molecular Dynamics was written by Nicole Hemsoth at The Next Platform.
Custom accelerators for neural network training have garnered plenty of attention in the last couple of years, but without significant software footwork, many are still difficult to program and could leave efficiencies on the table. This can be addressed through various model optimizations, but as some argue, the efficiency and utilization gaps can also be addressed with a tailored compiler.
Eugenio Culurciello, an electrical engineer at Purdue University, argues that getting full computational efficiency out of custom deep learning accelerators is difficult. This prompted his team at Purdue to build an FPGA based accelerator that could be agnostic to CNN …
Wrenching Efficiency Out of Custom Deep Learning Accelerators was written by Nicole Hemsoth at The Next Platform.
Ziyang Xu from Peking University in Beijing sees several similarities between the human brain and Von Neumann computing devices.
While he believes there is value in neuromorphic, or brain-inspired, chips, with the right operating system, standard processors can mimic some of the efficiencies of the brain and achieve similar performance for certain tasks.
In short, even though our brains do not have the same high-speed, high-frequency capacity of modern chips, the way information is routed and addressed is the key. At the core of this efficiency is a concept similar to a policy engine governing information compression, storage, and retrieval. …
An OS for Neuromorphic Computing on Von Neumann Devices was written by Nicole Hemsoth at The Next Platform.
For developers, deep learning systems are becoming more interactive and complex. From the building of more malleable datasets that can be iteratively augmented, to more dynamic models, to more continuous learning being built into neural networks, there is a greater need to manage the process from start to finish with lightweight tools.
“New training samples, human insights, and operation experiences can consistently emerge even after deployment. The ability of updating a model and tracking its changes thus becomes necessary,” says a team from Imperial College London that has developed a library to manage the iterations deep learning developers make across …
Managing Deep Learning Development Complexity was written by Nicole Hemsoth at The Next Platform.
The difficult part about storage these days is far less about capability than about adapting to change. Accordingly, the concept of programmable storage is getting more traction.
With such an approach, the internal services and abstractions of the storage stack can be considered as building blocks for higher level services and while this may not be simple to implement, it can work to eliminate duplication of complex, unreliable software that is commonly used as a workaround for storage system deficiencies.
A team from the University of California Santa Cruz has developed a programmable storage platform to counter these issues called …
Fresh Thinking on Programmable Storage was written by Nicole Hemsoth at The Next Platform.
Building on the successes of the Stampede1 supercomputer, the Texas Advanced Computing Center (TACC) has rolled out its next-generation HPC system, Stampede2. Over the course of 2017, Stampede2 will undergo further optimization phases with the support of a $30 million grant from the National Science Foundation (NSF). With the latest Xeon and Skylake processors, and enhanced networking provided by the Omni-Path architecture, the new flagship system is expected to deliver approximately 18 petaflops, nearly doubling Stampede1’s performance.
Stampede2 continues Stampede1’s mission: enabling thousands of scientists and researchers across the United States to deliver breakthrough scientific discoveries in science, engineering, artificial …
Texas Advanced Supercomputing Center Taps Latest HPC Tech was written by Nicole Hemsoth at The Next Platform.
Teams at Saudi Aramco using the Shaheen II at King Abdullah University of Science and Technology (KAUST) supercomputer have managed to scale ANSYS Fluent across 200,000 cores, marking top-end scaling for the commercial engineering code.
The news last year of a code scalability effort that topped out at 36,000 cores on the Blue Waters machine at the National Center for Supercomputing Applications (NCSA) was impressive. That was big news for ANSYS and NCSA, but also a major milestone for Cray. Just as Blue Waters is a Cray system, albeit one at the outer reaches of its lifespan (it was installed …
Engineering Code Scales Across 200,000 Cores on Cray Super was written by Nicole Hemsoth at The Next Platform.
While it might not be an exciting problem front and center of AI conversations, the issue of efficient hyperparameter tuning for neural network training is a tough one. There are some options that aim to automate this process but for most users, this is a cumbersome area—and one that can lead to bad performance when not done properly.
The problem with coming up with automatic tools for tuning is that many machine learning workloads are dependent on the dataset and the conditions of the problem being solved. For instance, some users might prefer less accuracy over a speedup or efficiency …
The Golden Grail: Automatic Distributed Hyperparameter Tuning was written by Nicole Hemsoth at The Next Platform.
Having been at the forefront of machine learning since the 1980s when I was a staff scientist in the Theoretical Division at Los Alamos performing basic research on machine learning (and later applying it in many areas including co-founding a machine-learning based drug discovery company), I was lucky enough to participate in the creation and subsequently to observe first-hand the process by which the field of machine-learning grew to become a ‘bandwagon’ that eventually imploded due to misconceptions about the technology and what it could accomplish.
Fueled by across-the-board technology advances including algorithmic developments, machine learning has again become a …
Technology Requirements for Deep and Machine Learning was written by Nicole Hemsoth at The Next Platform.
The last couple of years has seen a steady drumbeat for the use of low precision in growing numbers of workloads driven in large part by the rise of machine learning and deep learning applications and the ongoing desire to cut back on the amount of power consumed.
The interest in low precision is rippling through the high-performance computing (HPC) field, spanning companies that are running applications sets to the tech vendors that are creating the systems and components on which the work is done.
The Next Platform has kept a steady eye on the developments the deep-learning and machine-learning …
High Expectations for Low Precision at CERN was written by Nicole Hemsoth at The Next Platform.
Supercomputing centers around the world are preparing their next generation architectural approaches for the insertion of AI into scientific workflows. For some, this means retooling around an existing architecture to make capability of double-duty for both HPC and AI.
Teams in China working on the top performing supercomputer in the world, the Sunway TaihuLight machine with its custom processor, have shown that their optimizations for theSW26010 architecture on deep learning models have yielded a 1.91-9.75X speedup over a GPU accelerated model using the Nvidia Tesla K40m in a test convolutional neural network run with over 100 parameter configurations.
Efforts on …
China Tunes Neural Networks for Custom Supercomputer Chip was written by Nicole Hemsoth at The Next Platform.
When talking about the future of supercomputers and high-performance computing, the focus tends to fall on the ongoing and high-profile competition between the United States with its slowly eroding place as the kingpin in the industry and China and the tens of billions of dollars that the government has invested in recent years to rapidly expand the reach of the country’s tech community and the use of home-grown technologies in massive new systems.
Both trends were on display at the recent International Supercomputing Conference in Frankfurt, Germany, where China not only continued to hold the top two spots on the …
OpenPower, Efficiency Tweaks Define Europe’s DAVIDE Supercomputer was written by Nicole Hemsoth at The Next Platform.