The supercomputing business, the upper stratosphere of the much broader high performance computing segment of the IT industry, is without question one of the most exciting areas in data processing and visualization.
It is also one of the most frustrating sectors in which to try to make a profitable living. The customers are the most demanding, the applications are the most complex, the budget pressures are intense, the technical challenges are daunting, the governments behind major efforts can be capricious, and the competition is fierce.
This is the world where Cray, which literally invented the supercomputing field, and its competitors …
Supercomputing At The Crossroads was written by Timothy Prickett Morgan at The Next Platform.
Two important changes to the datacenter are happening in the same year—one on the hardware side, another on the software side. And together, they create a force big enough to blow away the clouds, at least over the long haul.
As we covered this year from a datacentric (and even supercomputing) point of view, 2018 is the time for Arm to shine. With a bevy of inroads to commercial markets at the high-end all the way down to the micro-device level, the architecture presents a genuine challenge to the processor establishment. And now, coupled with the biggest trend since …
Inference is the Hammer That Breaks the Datacenter was written by Nicole Hemsoth at The Next Platform.
For several years, work has been underway to develop a standard interconnect that can address the increasing speeds in servers driven by the growing use of such accelerators as GPUs and field-programmable gate arrays (FPGAs) and the pressures put on memory by the massive amounts of data being generated and bottleneck between the CPUs and the memory.
Any time the IT industry wants a standard, you can always expect at least two, and this time around is no different. Today there is a cornucopia of emerging interconnects, some of them overlapping in purpose, some working side by side, to break …
Gen-Z Interconnect Ready To Restore Compute Memory Balance was written by Jeffrey Burt at The Next Platform.
It has taken nearly four years for the low end, workhorse machines in IBM’s Power Systems line to be updated, and the long awaited Power9 processors and the shiny new “ZZ” systems have been unveiled. We have learned quite a bit about these machines, many of which are not really intended for the kinds of IT organizations that The Next Platform is focused on. But several of the machines are aimed at large enterprises, service providers, and even cloud builders who want something with a little more oomph on a lot of fronts than an X86 server can deliver in …
The Ins And Outs Of IBM’s Power9 ZZ Systems was written by Timothy Prickett Morgan at The Next Platform.
When it comes to machine learning training, people tend to focus on the compute. We always want to know if the training is being done on specialized parallel X86 devices, like Intel’s Xeon Phi, or on massively parallel GPU devices, like Nvidia’s “Pascal” and “Volta” accelerators, or even on custom devices from the likes of Nervana Systems (now part of Intel), Wave Systems, Graphcore, Google, or Fujitsu.
But as is the case with other kinds of high performance computing, the network matters when it comes to machine learning, and it can be the differentiating …
Programmable Networks Train Neural Nets Faster was written by Timothy Prickett Morgan at The Next Platform.
On today’s podcast episode of “The Interview” with The Next Platform, we talk about exascale power and resiliency by way of a historical overview of architectures with long-time HPC researcher, Dr. Robert Fowler.
Fowler’s career in HPC began at his alma mater, Harvard in the early seventies with scientific codes and expanded across the decades to include roles at several universities, including the University of Washington, the University of Rochester, Rice University, and most recently, RENCI at the University of North Carolina at Chapel Hill where he spearheads high performance computing initiatives and projects, including one we will …
Looking Back: The Evolution of HPC Power, Efficiency and Reliability was written by Nicole Hemsoth at The Next Platform.
The field of competitors looking to bring exascale-capable computers to the market is a somewhat crowded one, but the United States and China continue to be the ones that most eyes are on.
It’s a clash of an established global superpower and another one on the rise, and one that that envelopes a struggle for economic, commercial and military advantages and a healthy dose of national pride. And because of these two countries, the future of exascale computing – which to a large extent to this point has been more about discussion, theory and promise – will come into sharper …
A Look at What’s in Store for China’s Tianhe-2A Supercomputer was written by Jeffrey Burt at The Next Platform.
Today’s podcast episode of “The Interview” with The Next Platform will focus on an effort to standardize key neural network features to make development and innovation easier and more productive.
While it is still too early to standardize across major frameworks for training, for instance, portability for new architectures via a common file format is a critical first step toward more interoperability between frameworks and between training and inferencing tools.
To explore this, we are joined by Neil Trevett, Vice President of the Developer Ecosystem at Nvidia and President of the Khronos Group, an industry consortium focused on creating open …
Establishing Early Neural Network Standards was written by Nicole Hemsoth at The Next Platform.
The HPC crowd got a little taste of the IBM’s “Nimbus” Power9 processors for scale out systems, juiced by Nvidia “Volta” Tesla GPU accelerators, last December with the Power AC922 system that is the basis of the “Summit” and “Sierra” pre-exascale supercomputers being built by Big Blue for the US Department of Energy.
Now, IBM’s enterprise customers that use more standard iron in their clusters, and who predominantly have CPU-only setups rather than adding in GPUs or FPGAs and who need a lot more local storage, are getting more of a Power9 meal with the launch of six new machines …
A First Look At IBM’s Power9 ZZ Systems was written by Timothy Prickett Morgan at The Next Platform.
Neural networks live on data and rely on computational firepower to help them take in that data, train on it and learn from it. The challenge increasingly is ensuring there is enough computational power to keep up with the massive amounts of data that is being generated today and the rising demands from modern neural networks for speed and accuracy in consuming the data and training on datasets that continue to grow in size.
These challenges can be seen playing out in the fast-growing autonomous vehicle market, where pure-play companies like Waymo – born from Google’s self-driving car initiative – …
Even at the Edge, Scale is the Real Challenge was written by Jeffrey Burt at The Next Platform.
Massive data growth and advances in acceleration technologies are pushing modern computing capabilities to unprecedented levels and changing the face of entire industries.
Today’s organizations are quickly realizing that the more data they have the more they can learn, and powerful new techniques like artificial intelligence (AI) and deep learning are helping them convert that data into actionable intelligence that can transform nearly every aspect of their business. NVIDIA GPUs and Hewlett Packard Enterprise (HPE) high performance computing (HPC) platforms are accelerating these capabilities and helping organizations arrive at deeper insights, enable dynamic correlation, and deliver predictive outcomes with superhuman …
Delivering Predictive Outcomes with Superhuman Knowledge was written by Nicole Hemsoth at The Next Platform.
Alex St. John is a familiar name in the GPU and gaming industry given his role at Microsoft in the creation of DirectX technology in the 90s. And while his fame may be rooted in graphics for PC players, his newest venture has sparked the attention of both the supercomputing and enterprise storage crowds—and for good reason.
It likely helps to have some notoriety when it comes to securing funding, especially one that has roots in notoriously venture capital-denied supercomputing ecosystem. While St. John’s startup Nyriad may be a spin-out of technology developed for the Square Kilometre Array (SKA), the …
Exascale Storage Gets a GPU Boost was written by Nicole Hemsoth at The Next Platform.
On today’s podcast episode of “The Interview” with The Next Platform, we focus on some of the recent quantum computing developments out of Oak Ridge National Lab’s Quantum Computing Institute with the center’s director, Dr. Travis Humble.
Regular readers will recall previous work Humble has done on the quantum simulator, as well as other lab and Quantum Insitute efforts on creating hybrid quantum and neuromorphic supercomputers and building software frameworks to support quantum interfacing. In our discussion we check in on progress along all of these fronts, including a more detailed conversation about the XACC programming framework for …
At the Cutting Edge of Quantum Computing Research was written by Nicole Hemsoth at The Next Platform.
Google laid down its path forward in the machine learning and cloud computing arenas when it first unveiled plans for its tensor processing unit (TPU), an accelerator designed by the hyperscaler to speeding up machine learning workloads that are programmed using its TensorFlow framework.
Almost a year ago, at its Google I/O event, the company rolled out the architectural details of its second-generation TPUs – also called the Cloud TPU – for both neural network training and inference, with the custom ASICs providing up to 180 teraflops of floating point performance and 64 GB of High Bandwidth Memory. …
Google Boots Up Tensor Processors On Its Cloud was written by Jeffrey Burt at The Next Platform.
The Lustre file system has been the canonical choice for the world’s largest supercomputers, but for the rest of high performance computing user base, it is moving beyond reach without the support and guidance it has had from its many backers, including most recently Intel, which dropped Lustre from its development ranks in mid-2017.
While Lustre users have seen the support story fall to pieces before, for many HPC shops, the need is greater than ever to look toward a fully supported scalable parallel file system that snaps well to easy to manage appliances. Some of these commercial HPC sites …
Lustre Shines at HPC Peaks, But Rest of Market is Fertile Ground was written by Nicole Hemsoth at The Next Platform.
In this episode of The Interview from The Next Platform, we talk with Andrew Jones from independent high performance computing consulting firm, N.A.G. about processor and system acquisition trends in HPC for users at the smaller commercial end of the spectrum up through the large research centers.
In the course of the conversation, we cover how acquisition trends are being affected by machine learning entering the HPC workflow in the coming years, the differences over time between commercial HPC and academic supercomputing, and some of the issues around processor choices for both markets.
Given his experiences talking to end users …
HPC System & Processor Trends for 2018 was written by Nicole Hemsoth at The Next Platform.
The combination of the excitement for new video games, the machine learning software revolution, the buildout of very large supercomputers based on hybrid CPU-GPU architectures, and the mining of cryptocurrencies like Bitcoin and Ethereum have combined into a quadruple whammy that is driving Nvidia to new heights for revenues, profits, and market capitalization. And thus it is no surprise Nvidia is one of the few companies that is bucking the trend in a very tough couple of weeks on Wall Street.
But having demand spiking for both its current “Volta” GPUs, which are currently aimed at HPC and AI compute, …
Just How Large Can Nvidia’s Datacenter Business Grow? was written by Timothy Prickett Morgan at The Next Platform.
Co-design is all the rage these days in systems design, where the hardware and software components of a system – whether it is aimed at compute, storage, or networking – are designed in tandem, not one after the other, and immediately affect how each aspect of a system are ultimate crafted. It is a smart idea that wrings the maximum amount of performance out of a system for very precise workloads.
The era of general purpose computing, which is on the wane, brought an ever-increasing amount of capacity to bear in the datacenter at an ever -lower cost, enabling an …
Different Server Workhorses For Different Workload Courses was written by Timothy Prickett Morgan at The Next Platform.
Cloud datacenters in many ways are like melting pots of technologies. The massive facilities hold a broad array of servers, storage systems, and networking hardware that come in a variety of sizes. Their components come with different speeds, capacities, bandwidths, power consumption, and pricing, and they are powered by different processor architectures, optimized for disparate applications, and carry the logos of a broad array of hardware vendors, from the largest OEMs to the smaller ODMs. Some hardware systems are homegrown or built atop open designs.
As such, they are good places to compare and contrast how the components of these …
A Statistical View Of Cloud Storage was written by Jeffrey Burt at The Next Platform.
Compute is being embedded in everything, and there is another wave of distributed computing pushing out from the datacenter into all kinds of network, storage, and other kinds of devices that collect and process data in their own right as well as passing it back up to the glass house for final processing and permanent storage.
The computing requirements at the edge are different from the core compute in the datacenter, and it is very convenient indeed that they align nicely with some of the more modest processing needs of network devices, storage clusters, and more modest jobs in the …
Intel Sharpens The Edge With Skylake Xeon D was written by Timothy Prickett Morgan at The Next Platform.