In the IT business, just like any other business, you have to try to sell what is on the truck, not what is planned to be coming out of the factories in the coming months and years. AMD has put a very good X86 server processor into the market for the first time in nine years, and it also has a matching GPU that gives its OEM and ODM partners a credible alternative for HPC and AI workload to the combination of Intel Xeons and Nvidia Teslas that dominate hybrid computing these days.
There are some pretty important caveats to …
The Shape Of AMD HPC And AI Iron To Come was written by Timothy Prickett Morgan at The Next Platform.
Google has been at the bleeding edge of AI hardware development with the arrival of its TPU and other system-scale modifications to make large-scale neural network processing efficient and fast.
But just as these developments come to fruition, advances in trimmed-down deep learning could move many more machine learning training and inference operations out of the datacenter and into your palm.
Although it might be natural to think the reason that neural networks cannot be processed on devices like smartphones is because of limited CPU power, the real challenge lies in the vastness of the model sizes and hardware memory …
Google Research Pushing Neural Networks Out of the Datacenter was written by Nicole Hemsoth at The Next Platform.
Novel architectures are born out of necessity and for some applications, including molecular dynamics, there have been endless attempts to push parallel performance.
In this area, there are already numerous approaches to acceleration. At the highest end is the custom ASIC-driven Anton machine from D.E. Shaw, which is the fastest system, but certainly not the cheapest. On the more accessible accelerators side are Tesla GPUs for accelerating highly parallel parts of the workload—and increasingly, FPGAs are being considered for boosting the performance of major molecular dynamics applications, most notably GROMACS as well as general purpose, high-end CPUs (Knights Landing …
A MapReduce Accelerator to Tackle Molecular Dynamics was written by Nicole Hemsoth at The Next Platform.
Custom accelerators for neural network training have garnered plenty of attention in the last couple of years, but without significant software footwork, many are still difficult to program and could leave efficiencies on the table. This can be addressed through various model optimizations, but as some argue, the efficiency and utilization gaps can also be addressed with a tailored compiler.
Eugenio Culurciello, an electrical engineer at Purdue University, argues that getting full computational efficiency out of custom deep learning accelerators is difficult. This prompted his team at Purdue to build an FPGA based accelerator that could be agnostic to CNN …
Wrenching Efficiency Out of Custom Deep Learning Accelerators was written by Nicole Hemsoth at The Next Platform.
The “Skylake” Xeon SP processors from Intel have been in the market for nearly a month now, and we thought it would be a good time to drill down into the architecture of the new processor. We also want to see what the new Xeon SP has to offer for HPC, AI, and enterprise customers as well as compare the new X86 server motor to prior generations of Xeons and alternative processors in the market that are vying for a piece of the datacenter action.
That’s a lot, and we relish it. So let’s get started with a deep dive …
Drilling Down Into The Xeon Skylake Architecture was written by Timothy Prickett Morgan at The Next Platform.
While the hyperscalers of the world are pushing the bandwidth envelope and are rolling out 100 Gb/sec gear in their Ethernet switch fabrics and looking ahead to the not-too-distant future when 200 Gb/sec and even 400 Gb/sec will be available, enterprise customers who make up the majority of switch revenues are still using much slower networks, usually 10 Gb/sec and sometimes even 1 Gb/sec, and 100 Gb/sec seems like a pretty big leap.
That is why Broadcom, which still has the lion’s share of switch ASIC sales in the datacenter, has revved its long-running Trident family of chips, which lead …
Making Mainstream Ethernet Switches More Malleable was written by Timothy Prickett Morgan at The Next Platform.
Explosive data growth and a rising demand for real-time analytics are making high performance computing (HPC) technologies increasingly vital to success. Organizations across all industries are seeking the next generation of IT solutions to facilitate scientific research, enhance national security, ensure economic stability, and empower innovation to face the challenges of today and tomorrow.
HPC solutions are key to quickly answering some of the world’s most daunting questions. From Tesla’s self-driving car to quantum computing, artificial intelligence (AI) is enabling unparalleled compute capabilities and outmatching humans at many cognitive tasks. Deep learning, an advanced AI technique, is growing in popularity …
Accelerating Deep Learning Insights With GPU-Based Systems was written by Timothy Prickett Morgan at The Next Platform.
Ziyang Xu from Peking University in Beijing sees several similarities between the human brain and Von Neumann computing devices.
While he believes there is value in neuromorphic, or brain-inspired, chips, with the right operating system, standard processors can mimic some of the efficiencies of the brain and achieve similar performance for certain tasks.
In short, even though our brains do not have the same high-speed, high-frequency capacity of modern chips, the way information is routed and addressed is the key. At the core of this efficiency is a concept similar to a policy engine governing information compression, storage, and retrieval. …
An OS for Neuromorphic Computing on Von Neumann Devices was written by Nicole Hemsoth at The Next Platform.
For developers, deep learning systems are becoming more interactive and complex. From the building of more malleable datasets that can be iteratively augmented, to more dynamic models, to more continuous learning being built into neural networks, there is a greater need to manage the process from start to finish with lightweight tools.
“New training samples, human insights, and operation experiences can consistently emerge even after deployment. The ability of updating a model and tracking its changes thus becomes necessary,” says a team from Imperial College London that has developed a library to manage the iterations deep learning developers make across …
Managing Deep Learning Development Complexity was written by Nicole Hemsoth at The Next Platform.
The difficult part about storage these days is far less about capability than about adapting to change. Accordingly, the concept of programmable storage is getting more traction.
With such an approach, the internal services and abstractions of the storage stack can be considered as building blocks for higher level services and while this may not be simple to implement, it can work to eliminate duplication of complex, unreliable software that is commonly used as a workaround for storage system deficiencies.
A team from the University of California Santa Cruz has developed a programmable storage platform to counter these issues called …
Fresh Thinking on Programmable Storage was written by Nicole Hemsoth at The Next Platform.
It looks like the push to true cloud computing that many of us have been projecting for such a long time is actually coming to pass, and despite many of the misgivings that many of us have expressed about giving up control of our own datacenters and the applications that run there.
That chip giant Intel is making money as it rolls up its 14 nanometer manufacturing process ramp is not really a surprise. During the second quarter of this year, rival AMD had not yet gotten its “Naples” Epyc X86 server processors into the field, and IBM has pushed …
The Skylake Calm Before The Compute Storm was written by Timothy Prickett Morgan at The Next Platform.
Supercomputing, by definition, is an esoteric, exotic, and relatively small slice of the overall IT landscape, but it is, also by definition, a vital driver of innovation within IT and in all of the segments of the market where simulation, modeling, and now machine learning are used to provide goods and services.
As we have pointed out many times, the supercomputing business is not, however, one that is easy to participate in and generate a regular stream of revenues and predictable profits and it is most certainly one where the vendors and their customers have to, by necessity, take the …
The Supercomputing Slump Hits HPC was written by Timothy Prickett Morgan at The Next Platform.
Building on the successes of the Stampede1 supercomputer, the Texas Advanced Computing Center (TACC) has rolled out its next-generation HPC system, Stampede2. Over the course of 2017, Stampede2 will undergo further optimization phases with the support of a $30 million grant from the National Science Foundation (NSF). With the latest Xeon and Skylake processors, and enhanced networking provided by the Omni-Path architecture, the new flagship system is expected to deliver approximately 18 petaflops, nearly doubling Stampede1’s performance.
Stampede2 continues Stampede1’s mission: enabling thousands of scientists and researchers across the United States to deliver breakthrough scientific discoveries in science, engineering, artificial …
Texas Advanced Supercomputing Center Taps Latest HPC Tech was written by Nicole Hemsoth at The Next Platform.
The oil and gas industry has been on the cutting edge of many waves of computing over the several decades that supercomputers have been used to model oil reservoirs in both the planning of the development of an oil field and in quantifying the stored reserves of a field and therefore the future possible revenue stream of the company.
Oil companies can’t see through the earth’s crust to the domes where oil has been trapped, and it is the job of reservoir engineers to eliminate as much risk as possible from the field so the oil company can be prosperous …
Oil And Gas Upstart Has No Reserves About GPUs was written by Timothy Prickett Morgan at The Next Platform.
Teams at Saudi Aramco using the Shaheen II at King Abdullah University of Science and Technology (KAUST) supercomputer have managed to scale ANSYS Fluent across 200,000 cores, marking top-end scaling for the commercial engineering code.
The news last year of a code scalability effort that topped out at 36,000 cores on the Blue Waters machine at the National Center for Supercomputing Applications (NCSA) was impressive. That was big news for ANSYS and NCSA, but also a major milestone for Cray. Just as Blue Waters is a Cray system, albeit one at the outer reaches of its lifespan (it was installed …
Engineering Code Scales Across 200,000 Cores on Cray Super was written by Nicole Hemsoth at The Next Platform.
IBM is a bit of an enigma these days. It has the art – some would say black magic – of financial engineering down pat, and its system engineering is still quite good. Big Blue talks about all of the right things for modern computing platforms, although it speaks a slightly different dialect because the company still thinks that it is the one setting the pace, and therefore coining the terms, rather than chasing markets that others are blazing. And it just can’t seem to grow revenues, even after tens of billions of dollars in acquisitions and internal investments over …
The Trials And Tribulations Of IBM Systems was written by Timothy Prickett Morgan at The Next Platform.
When all of your business is driven by end users coming to use your applications over the Internet, the network is arguably the most critical part of the infrastructure. That is why search engine and ad serving giant Google, which has expanded out to media serving, hosted enterprise applications, and cloud computing, has put a tremendous amount of investment into creating its own network stack.
But running a fast, efficient, hyperscale network for internal datacenters is not sufficient for a good user experience, and that is why Google has created a software defined networking stack to do routing over the …
How Google Wants To Rewire The Internet was written by Timothy Prickett Morgan at The Next Platform.
While it might not be an exciting problem front and center of AI conversations, the issue of efficient hyperparameter tuning for neural network training is a tough one. There are some options that aim to automate this process but for most users, this is a cumbersome area—and one that can lead to bad performance when not done properly.
The problem with coming up with automatic tools for tuning is that many machine learning workloads are dependent on the dataset and the conditions of the problem being solved. For instance, some users might prefer less accuracy over a speedup or efficiency …
The Golden Grail: Automatic Distributed Hyperparameter Tuning was written by Nicole Hemsoth at The Next Platform.
No matter what, system architects are always going to have to contend with one – and possibly more – bottlenecks when they design the machines that store and crunch the data that makes the world go around. These days, there is plenty of compute at their disposal, a reasonable amount of main memory to hang off of it, and both Ethernet and InfiniBand are on the cusp of 200 Gb/sec of performance and not too far away from 400 Gb/sec and even higher bandwidths.
Now, it looks like the peripheral bus based on the PCI-Express protocol is becoming the bottleneck, …
The System Bottleneck Shifts To PCI-Express was written by Timothy Prickett Morgan at The Next Platform.
Having been at the forefront of machine learning since the 1980s when I was a staff scientist in the Theoretical Division at Los Alamos performing basic research on machine learning (and later applying it in many areas including co-founding a machine-learning based drug discovery company), I was lucky enough to participate in the creation and subsequently to observe first-hand the process by which the field of machine-learning grew to become a ‘bandwagon’ that eventually imploded due to misconceptions about the technology and what it could accomplish.
Fueled by across-the-board technology advances including algorithmic developments, machine learning has again become a …
Technology Requirements for Deep and Machine Learning was written by Nicole Hemsoth at The Next Platform.