Nicole Hemsoth

Author Archives: Nicole Hemsoth

Cray ARMs Highest End Supercomputer with ThunderX2

Just this time last year, the projection was that by 2020, ARM processors would be chewing on twenty percent of HPC workloads. In that short span of time, the grain of salt many took with that figure has dropped with the addition of some very attractive options for supercomputing from ARM hardware makers.

Last winter, the big ARM news for HPC was mostly centered on the Mont Blanc project at the Barcelona Supercomputer Center. However, as the year unfolded, details on new projects with ARM at the core including the Post-K supercomputer in Japan and the Isambard supercomputer in the

Cray ARMs Highest End Supercomputer with ThunderX2 was written by Nicole Hemsoth at The Next Platform.

IBM Bolsters Quantum Capability, Emphasizes Device Differentiation

Much of the quantum computing hype of the last few years has centered on D-Wave, which has installed a number of functional systems and is hard at work making quantum programming more practical.

Smaller companies like Rigetti Computing are gaining traction as well, but all the while, in the background, IBM has been steadily furthering quantum computing work that kicked off at IBM Research in the mid-1970s with the introduction of the quantum information concept by Charlie Bennett.

Since those early days, IBM has hit some important milestones on the road to quantum computing, including demonstrating the first quantum

IBM Bolsters Quantum Capability, Emphasizes Device Differentiation was written by Nicole Hemsoth at The Next Platform.

HPE Developing its Own Low Power “Neural Network” Chips

With so many chip startups targeting the future of deep learning training and inference, one might expect it would be far easier for tech giant Hewlett Packard Enterprise to buy versus build. However, when it comes to select applications at the extreme edge (for space missions in particular), nothing in the ecosystem fits the bill.

In the context of a broader discussion about the company’s Extreme Edge program focused on space-bound systems, HPE’s Dr. Tom Bradicich, VP and GM of Servers, Converged Edge, and IoT systems, described a future chip that would be ideally suited for high performance computing under

HPE Developing its Own Low Power “Neural Network” Chips was written by Nicole Hemsoth at The Next Platform.

Arm Smooths the Path for Porting HPC Apps

One of the arguments Intel officials and others have made against Arm’s push to get its silicon designs into the datacenter has been the burden it would mean for enterprises and organizations in the HPC field that would have to modify application codes to get their software to run on the Arm architecture.

For HPC organizations, that would mean moving the applications from the Intel-based and IBM systems that have dominated the space for years, a time-consuming and possibly costly process.

Arm officials over the years have acknowledged the challenge, but have noted their infrastructure’s embrace of open-source software and

Arm Smooths the Path for Porting HPC Apps was written by Nicole Hemsoth at The Next Platform.

Hybrid Fortran Pulls Legacy Codes into Acceleration Era

GPU accelerated supercomputing is not a new phenomenon with many high performance computing codes already primed to run on Nvidia hardware in particular.

However, for some legacy codes with special needs (changing models, high computational demands), particularly in areas like weather, the gap between those codes and the promise of GPU acceleration is rather large, even with higher level tools like OpenACC to bridge the divide—all without major code rewrites.

Given the limitations of porting some legacy Fortran codes to GPUs, a research team Tokyo Tech has devised what it calls, “hybrid Fortran” which is designed to “increase productivity when

Hybrid Fortran Pulls Legacy Codes into Acceleration Era was written by Nicole Hemsoth at The Next Platform.

Deep Voice 3: Ten Million Queries on a Single GPU Server

Although much of the attention around deep learning for voice has focused on speech recognition, developments in artificial speech synthesis (text to speech) based on neural network approaches have been just as swift.

The goal with text-to-speech (TTS), as in other voice-related deep learning areas, is to get the training and inference times way down to allow for fast delivery of services and low power consumption and utilization of hardware resources. A recent effort at Chinese search giant, Baidu, which is at often at the forefront of deep learning for voice recognition and TTS, has shown remarkable progress on both

Deep Voice 3: Ten Million Queries on a Single GPU Server was written by Nicole Hemsoth at The Next Platform.

One Step Closer to Easier Quantum Programming

For quantum computing to make the leap from theory and slim early use cases to broader adoption, a programmability jump is required. Some of the first hurdles have been knocked over in the last few weeks with new compiler and API-based development efforts that abstract some of the complex physics required for both qubit and gate-based approaches to quantum devices.

The more public recent effort was the open source publication of OpenFermion, a quantum compiler based on work at Google and quantum startup, Rigetti Computing, that is focused on applications in quantum chemistry and materials science. OpenFermion is

One Step Closer to Easier Quantum Programming was written by Nicole Hemsoth at The Next Platform.

Cray Supercomputers One Step Closer to Cloud Users

Supercomputer maker Cray is always looking for ways to extend its reach outside of the traditional academic and government markets where the biggest deals are often made.

From its forays into graph analytics appliances and more recently, machine and deep learning, the company has potential to exploit its long history building some of the world’s fastest machines. This has expanded into some new ventures wherein potential new Cray users can try on the company’s systems, including via an on-demand partnership with datacenter provider, Markley, and now, inside of Microsoft’s Azure datacenters.

For Microsoft Azure cloud users looking to bolster modeling

Cray Supercomputers One Step Closer to Cloud Users was written by Nicole Hemsoth at The Next Platform.

Baidu Sheds Precision Without Paying Deep Learning Accuracy Cost

One of the reasons we have written so much about Chinese search and social web giant, Baidu, in the last few years is because they have openly described both the hardware and software steps to making deep learning efficient and high performance at scale.

In addition to providing several benchmarking efforts and GPU use cases, researchers at the company’s Silicon Valley AI Lab (SVAIL) have been at the forefront of eking power efficiency and performance out of new hardware by lowering precision. This is a trend that has kickstarted similar thinking in hardware usage in other areas, including supercomputing

Baidu Sheds Precision Without Paying Deep Learning Accuracy Cost was written by Nicole Hemsoth at The Next Platform.

Plans for First Exascale Supercomputer in U.S. Released

This morning a presentation filtered from the Department of Energy’s Office of Science showing the roadmap to exascale with a 2021 machine at Argonne National Lab.

This is the Aurora machine, which had an uncertain future this year when its budgetary and other details were thrown into question. We understood the deal was being restructured and indeed it has. The system was originally slated to appear in 2018 with 180 petaflop potential. Now it is 1000 petaflops, an exascale capable machine, and will be delivered in 2021—right on target with the projected revised plans for exascale released earlier this

Plans for First Exascale Supercomputer in U.S. Released was written by Nicole Hemsoth at The Next Platform.

MapR Bulks Up Database for Modern Apps

MapR Technologies has been busy in recent years build out its capabilities as a data platform company that can support a broad range of open-source technologies, from Hadoop and Spark to Hive, and can reach from the data center through the edge and out into the cloud. At the center of its efforts is its Converged Data Platform, which comes with the MapR-FS Posix file system and includes enterprise-level database and storage that are designed to handle the emerging big data workloads.

At the Strata Data Conference in New York City Sept. 26, company officials are putting their focus

MapR Bulks Up Database for Modern Apps was written by Nicole Hemsoth at The Next Platform.

Supercomputing Advancing Too Fast for Key Codes to Keep Pace

The high performance computing world is set to become more diverse over the next several years on the hardware front, but for software development, this new array of ever-higher performance options creates big challenges for codes.

While the hardware advances might be moving too quick for long-standing software to take optimal advantage of, for some areas, things are at a relative standstill in terms of how to approach this future. Is it better to keep optimizing old codes that could be ticked along with the X86 tocks, or does a new architectural landscape mean starting from scratch with scientific codes–even

Supercomputing Advancing Too Fast for Key Codes to Keep Pace was written by Nicole Hemsoth at The Next Platform.

Shedding Light on on Dark Bandwidth

We have heard much about the concept of dark silicon but there is a separate, related companion to this idea.

Dark bandwidth is a term that is being bandied about to describe the major inefficiencies of data movement. The idea of this is not unknown or new, but some of the ways the problem is being tackled present new practical directions as the emphasis on system balance over pure performance persists.

As ARM Research architect, Jonathan Beard describes it, the way systems work now is a lot like ordering a tiny watch battery online and having it delivered in a

Shedding Light on on Dark Bandwidth was written by Nicole Hemsoth at The Next Platform.

Hospital Captures First Commercial Volta GPU Based DGX-1 Systems

At well over $150,000 per appliance, the Volta GPU based DGX appliances from Nvidia, which take aim at deep learning with framework integration and 8 Volta-accelerated nodes linked with NVlink, is set to appeal to the most bleeding edge of machine learning shops.

Nvidia has built its own clusters by stringing several of these together, just as researchers at Tokyo Tech have done with the Pascal generation systems. But one of the first commercial customers for the Volta based boxes is the Center for Clinical Data Science, which is part of the first wave of hospitals set to

Hospital Captures First Commercial Volta GPU Based DGX-1 Systems was written by Nicole Hemsoth at The Next Platform.

What’s So Bad About POSIX I/O?

POSIX I/O is almost universally agreed to be one of the most significant limitations standing in the way of I/O performance exascale system designs push 100,000 client nodes.

The desire to kill off POSIX I/O is a commonly beaten drum among high-performance computing experts, and a variety of new approaches—ranging from I/O forwarding layers, user-space I/O stacks, and completely new I/O interfaces—are bandied about as remedies to the impending exascale I/O crisis.

However, it is much less common to hear exactly why POSIX I/O is so detrimental to scalability and performance, and what needs to change to have a suitably

What’s So Bad About POSIX I/O? was written by Nicole Hemsoth at The Next Platform.

NASA Supercomputing Strategy Takes the Road Less Traveled

For a large institution playing at the leadership-class supercomputing level, NASA tends to do things a little differently than its national lab and academic peers.

One of the most striking differences between how the space agency views its supercomputing future can be found at the facilities level. Instead of building massive brick and mortar datacenters within a new or existing complex, NASA has taken the modular route, beginning with its Electra supercomputer and in the near future, with a 30 Megawatt-capable new modular installation that can house about a million compute cores.

“What we found is that the modular approach

NASA Supercomputing Strategy Takes the Road Less Traveled was written by Nicole Hemsoth at The Next Platform.

Heterogeneous Supercomputing on Japan’s Most Powerful System

We continue with our second part of the series on the Tsubame supercomputer (first section here) with the next segment of our interview with Professor Satoshi Matsuoka, of the Tokyo Institute of Technology (Tokyo Tech).

Matsuoka researches and designs large scale supercomputers and similar infrastructures. More recently, he has worked on the convergence of Big Data, machine/deep learning, and AI with traditional HPC, as well as investigating the post-Moore technologies towards 2025. He has designed supercomputers for years and has collaborated on projects involving basic elements for the current and more importantly future exascale systems.

TNP: Will you be running

Heterogeneous Supercomputing on Japan’s Most Powerful System was written by Nicole Hemsoth at The Next Platform.

First In-Depth View of Wave Computing’s DPU Architecture, Systems

Propping up a successful silicon startup is no simple feat, but venture-backed Wave Computing has managed to hold its own in the small but critical AI training chip market–so far.

Seven years after its founding and the company’s early access program for beta machines based on its novel DPU manycore architecture is now open, which is prompting Wave to be more forthcoming about the system and chip architecture for deep learning-focused dataflow architecture.

Dr. Chris Nicol, Wave Computing CTO and lead architect of the Dataflow Processing Unit (DPU) admitted to the crowd at Hot Chips this week that maintaining funding

First In-Depth View of Wave Computing’s DPU Architecture, Systems was written by Nicole Hemsoth at The Next Platform.

Inside View: Tokyo Tech’s Massive Tsubame 3 Supercomputer

Professor Satoshi Matsuoka, of the Tokyo Institute of Technology (Tokyo Tech) researches and designs large scale supercomputers and similar infrastructures. More recently, he has worked on the convergence of Big Data, machine/deep learning, and AI with traditional HPC, as well as investigating the Post-Moore Technologies towards 2025.

He has designed supercomputers for years and has collaborated on projects involving basic elements for the current and more importantly future Exascale systems. I talked with him recently about his work with the Tsubame supercomputers at Tokyo Tech. This is the first in a two-part article. For background on the Tsubame 3 system

Inside View: Tokyo Tech’s Massive Tsubame 3 Supercomputer was written by Nicole Hemsoth at The Next Platform.

An Early Look at Baidu’s Custom AI and Analytics Processor

In the U.S. it is easy to focus on our native hyperscale companies (Google, Amazon, Facebook, etc.) and how they design and deploy infrastructure at scale.

But as our regular readers understand well, the equivalent to Google in China, Baidu, has been at the bleeding edge with chips, systems, and software to feed its own cloud-delivered and research operations.

We’ve written much over the last few years about the company’s emphasis on streamlining deep learning processing, most notably with GPUs, but Baidu has a new processor up its sleeve called the XPU. For now, the device has just been demonstrated

An Early Look at Baidu’s Custom AI and Analytics Processor was written by Nicole Hemsoth at The Next Platform.

1 20 21 22 23 24 35