As we previously reported, Google unveiled its second-generation TensorFlow Processing Unit (TPU2) at Google I/O last week. Google calls this new generation “Google Cloud TPUs”, but provided very little information about the TPU2 chip and the systems that use it other than to provide a few colorful photos. Pictures do say more than words, so in this article we will dig into the photos and provide our thoughts based the pictures and on the few bits of detail Google did provide.
To start with, it is unlikely that Google will sell TPU-based chips, boards, or servers – TPU2 …
Under The Hood Of Google’s TPU2 Machine Learning Clusters was written by Timothy Prickett Morgan at The Next Platform.
Around this time last year, we delved into a new FPGA-based architecture that targeted efficient, scalable machine learning inference from startup DeePhi Tech. The company just rounded out its first funding effort with an undisclosed sum with major investors, including Banyan Capital and as we learned this week, FPGA maker Xilinx.
As that initial article details, the Stanford and Tsinghua University-fed research focused on network pruning and compression at low precision with a device that could be structured for low latency and custom memory allocations. These efforts were originally built on Xilinx FPGA hardware and given this first round of …
FPGA Startup Gathers Funding Force for Merged Hyperscale Inference was written by Nicole Hemsoth at The Next Platform.
One of the reasons why Nvidia has been able to quadruple revenues for its Tesla accelerators in recent quarters is that it doesn’t just sell raw accelerators as well as PCI-Express cards, but has become a system vendor in its own right through its DGX-1 server line. The company has also engineered new adapter cards specifically aimed at hyperscalers who want to crank up the performance on their machine learning inference workloads with a cheaper and cooler Volts GPU.
Nvidia does not break out revenues for the DGX-1 line separately from other Tesla and GRID accelerator product sales, but we …
Big Bang For The Buck Jump With Volta DGX-1 was written by Timothy Prickett Morgan at The Next Platform.
For almost a decade now, the cloud has been pitched as a cost-effective way to bring supercomputing out of the queue and into public IaaS or HPC on-demand environments. While there are certainly many use cases to prove that tightly-coupled problems can still work in the cloud despite latency hits (among other issues), application portability is one sticking point.
For instance, let’s say you have developed a financial modeling application on an HPC on demand service to prove that the model works so you can make the case for purchasing a large cluster to run it at scale on-prem. This …
Singularity is the Hinge To Swing HPC Cloud Adoption was written by Nicole Hemsoth at The Next Platform.
It is funny to think that Advanced Micro Devices has been around almost as long as the IBM System/360 mainframe and that it has been around since the United States landed people on the moon. The company has gone through many gut-wrenching transformations, adapting to changing markets. Like IBM and Apple, just to name two, AMD has had its share of disappointments and near-death experiences, but unlike Sun Microsystems, Silicon Graphics, Sequent Computer, Data General, Tandem Computer, and Digital Equipment, it has managed to stay independent and live to fight another day.
AMD wants a second chance in the datacenter, …
AMD Disrupts The Two-Socket Server Status Quo was written by Timothy Prickett Morgan at The Next Platform.
It was only just last month that we spoke with Google distinguished hardware engineer, Norman Jouppi, in depth about the tensor processing unit used internally at the search giant to accelerate deep learning inference, but that device—that first TPU—is already appearing rather out of fashion.
This morning at the Google’s I/O event, the company stole Nvidia’s recent Volta GPU thunder by releasing details about its second-generation tensor processing unit (TPU), which will manage both training and inference in a rather staggering 180 teraflops system board, complete with custom network to lash several together into “TPU pods” that can deliver Top …
First In-Depth Look at Google’s New Second-Generation TPU was written by Nicole Hemsoth at The Next Platform.
We are still chewing through all of the announcements and talk at the GPU Technology Conference that Nvidia hosted in its San Jose stomping grounds last week, and as such we are thinking about the much bigger role that graphics processors are playing in datacenter compute – a realm that has seen five decades of dominance by central processors of one form or another.
That is how CPUs got their name, after all. And perhaps this is a good time to remind everyone that systems used to be a collection of different kinds of compute, and that is why the …
The Embiggening Bite That GPUs Take Out Of Datacenter Compute was written by Timothy Prickett Morgan at The Next Platform.
For a mature company that kickstarted supercomputing as we know it, Cray has done a rather impressive job of reinventing itself over the years.
From its original vector machines, to HPC clusters with proprietary interconnects and custom software stacks, to graph analytics appliances engineered in-house, and now to machine learning, the company tends not to let trends in computing slip by without a new machine.
However, all of this engineering and tuning comes at a cost—something that, arguably, has kept Cray at bay when it comes to reaching the new markets that sprung up in the “big data” days of …
Cray Supercomputing as a Service Becomes a Reality was written by Nicole Hemsoth at The Next Platform.
The science fiction of a generation ago predicted a future in which humans were replaced by the reasoning might of a supercomputer. But in an unexpected twist of events, it appears the it is the supercomputer’s main output—scientific simulations—that could be replaced by an even higher order of intelligence.
While we will always need supercomputing hardware, the vast field of scientific computing, or high performance computing, could also be in the crosshairs for disruptive change, altering the future prospects for scientific code developers, but opening new doors in more energy-efficient, finer-grained scientific discovery. With code that can write itself based …
When Will AI Replace Traditional Supercomputing Simulations? was written by Nicole Hemsoth at The Next Platform.
GPU computing has deep roots in supercomputing, but Nvidia is using that springboard to dive head first into the future of deep learning.
This changes the outward-facing focus of the company’s Tesla business from high-end supers to machine learning systems with the expectation that those two formerly distinct areas will find new ways to merge together given the similarity in machine, scalability, and performance requirements. This is not to say that Nvidia is failing the HPC set, but there is a shift in attention from what GPUs can do for Top 500 class machines to what graphics processors can do …
The Year Ahead for GPU Accelerated Supercomputing was written by Nicole Hemsoth at The Next Platform.
Big data, data science, machine learning, and now deep learning are all the rage and have tons of hype, for better—and in some ways, for worse. Advancements in AI such as language understanding, self-driving cars, automated claims, legal text processing, and even automated medical diagnostics are already here or will be here soon.
In Asia, several countries have made significant advancements and investments into AI, leveraging their historical work in HPC.
China now owns the top three positions in the Top500 with Sunway TaihuLight, Tianhe-2, and Tianhe, and while Tianhe-2 and Tianhe were designed for HPC style workloads, TaihuLight is …
HPC to Deep Learning from an Asian Perspective was written by Nicole Hemsoth at The Next Platform.
Graphics chip maker Nvidia has taken more than a year and carefully and methodically transformed its GPUs into the compute engines for modern HPC, machine learning, and database workloads. To do so has meant staying on the cutting edge of many technologies, and with the much-anticipated but not very long-awaited “Volta” GP100 GPUs, the company is once again skating on the bleeding edge of several different technologies.
This aggressive strategy allows Nvidia to push the performance envelope on GPUs and therefore maintain its lead over CPUs for the parallel workloads it is targeting while at the same time setting up …
Nvidia’s Tesla Volta GPU Is The Beast Of The Datacenter was written by Timothy Prickett Morgan at The Next Platform.
Moving data is the biggest problem in computing, and probably has been since there was data processing if we really want to be honest about it. Because of the cost of bandwidth, latency, energy, and iron to do multiple stages of processing on information in a modern application that might include a database as well as machine learning algorithms against stuff stored there as well as from other sources, you want to try to do all your computation from the memory of one set of devices.
That, in a nutshell, is what the GPU Open Analytics Initiative is laying the …
GOAI: Keeping Databases, Analytics, And Machine Learning All On The GPU was written by Timothy Prickett Morgan at The Next Platform.
When Dell acquired EMC in its massive $60 billon-plus deal last year, it boasted that Dell was inheriting a boatload of new technologies that would help propel forward its capabilities and ambitions with larger enterprises.
That included offerings ranging from VMware’s NSX software-defined networking (SDN) platform to VirtuStream and its cloud technologies for running mission critical applications from the likes of Oracle, SAP and Microsoft off-premises. In particular, Dell was acquiring EMC’s broad and highly popular storage portfolio, in particular the high-end VMAX, XtremeIO, and newer ScaleIO lineups as well as its Isilon storage arrays for high performance workloads.
Dell …
Dell EMC Upgrades Flash in High-End Storage While Eyeing NVMe was written by Jeffrey Burt at The Next Platform.
There may be a shortage in the supply of DRAM main memory and NAND flash memory that is having an adverse effect on the server and storage markets, but there is no shortage of vendors who are trying to push the envelope on clustered storage using a mix of these memories and others such as the impending 3D XPoint.
Micron Technology, which makes and sells all three of these types of memories, is so impatient with the rate of technological advancement in clustered flash arrays based on the NVM-Express protocol that it decided to engineer and launch its own product …
Impatient For Fabrics, Micron Forges Its Own NVM-Express Arrays was written by Timothy Prickett Morgan at The Next Platform.
The last two years have delivered a new wave of deep learning architectures designed specifically for tackling both training and inference sides of neural networks. We have covered many of them extensively, but only a few have seen major investment or acquisition—the most notable of which was Nervana Systems over a year ago.
Among the string of neural network chip startups, Graphcore stood out with its manycore approach to handling both training and inference on the same manycore chip. We described the hardware architecture in detail back in March and while its over $30 million in funding from a …
A Dive into Deep Learning Chip Startup Graphcore’s Software Stack was written by Nicole Hemsoth at The Next Platform.
While it is always best to have the right tool for the job, it is better still if a tool can be used by multiple jobs and therefore have its utilization be higher than it might otherwise be. This is one of the reasons why general purpose, X86-based computing took over the datacenter. Economies of scale trumped the efficiency that can come from limited scope or just leaving legacy applications alone in place on alternate platforms.
The idea of offloading computational tasks from CPUs to GPU accelerators took off in academia a little more than a decade ago, and …
Crunching Machine Learning And Databases Together On GPUs was written by Timothy Prickett Morgan at The Next Platform.
Industrial companies have replaced people with machines, systems analysts with simulations, and now the simulations themselves could be outpaced by machine learning—albeit with a human in the loop, at the beginning at least.
The new holy grail of machine learning and deep learning, as with almost any other emerging technology set, is to mask enough of the complexity to make it broadly applicable without lose the performance and other features that can be retained by taking a low-level approach. If this kind of deep generalization can happen, a new mode of considering how data is used in research and enterprise …
Generalizing a Hardware, Software Platform for Industrial AI was written by Nicole Hemsoth at The Next Platform.
Enterprise spending on servers was a bit soft in the first quarter, as evidenced by the financial results posted by Intel and by its sometime rival IBM, but the hyperscale and HPC markets, at least when it comes to networking, was a bit soft, according to high-end network chip and equipment maker Mellanox Technologies.
In the first quarter ended March 31, Mellanox had a 4.1 percent revenue decline, to $188.7 million, and because of higher research and development costs, presumably associated with the rollout of 200 Gb/sec Quantum InfiniBand technology (which the company has talked about) and …
HPC System Delays Stall InfiniBand was written by Timothy Prickett Morgan at The Next Platform.
Energy efficiency and operating costs for systems are as important as raw performance in today’s datacenters. Everyone from the largest hyperscalers and high performance computing centers to large enterprises that are sometimes like them are trying squeeze as much performance as they can from their infrastructure while reining in power consumption and the costs associated with keeping it all from overheating.
Throw in the slowing down of Moore’s Law and new emerging workloads like data analytics and machine learning, and the challenge to these organizations becomes apparent.
In response, organizations on the cutting edge have embraced accelerators like GPUs and …
Rambus, Microsoft Put DRAM Into Deep Freeze To Boost Performance was written by Timothy Prickett Morgan at The Next Platform.