Nicole Hemsoth

Author Archives: Nicole Hemsoth

Deep Learning Architectures Hinge on Hybrid Memory Cube

We have heard about a great number of new architectures and approaches to scalable and efficient deep learning processing that sit outside of the standard CPU, GPU, and FPGA box and while each is different, many are leveraging a common element at all-important memory layer.

The Hybrid Memory Cube (HMC), which we expect to see much more of over the coming year and beyond, is at the heart of several custom architectures to suit the deep learning market. Nervana Systems, which was recently acquired by Intel (HMC maker, Micron’s close partner), Wave Computing, and other research efforts all see a

Deep Learning Architectures Hinge on Hybrid Memory Cube was written by Nicole Hemsoth at The Next Platform.

Hardware Slaves to the Master Algorithm

Over the long course of IT history, the burden has been on the software side to keep pace with rapid hardware advances—to exploit new capabilities and boldly go where no benchmarks have gone before. However, as we swiftly ride into a new age where machine learning and deep learning take the place of more static applications and software advances are far faster than chipmakers can tick and tock to, hardware device makers are scrambling.

That problem is profound enough on its own, and is an entirely different architectural dance than general purpose device have ever had to step to. Shrinking

Hardware Slaves to the Master Algorithm was written by Nicole Hemsoth at The Next Platform.

The Next Wave of Deep Learning Architectures

Intel has planted some solid stakes in the ground for the future of deep learning over the last month with its acquisition of deep learning chip startup, Nervana Systems, and most recently, mobile and embedded machine learning company, Movidius.

These new pieces will snap into Intel’s still-forming puzzle for capturing the supposed billion-plus dollar market ahead for deep learning, which is complemented by its own Knights Mill effort and software optimization work on machine learning codes and tooling. At the same time, just down the coast, Nvidia is firming up the market for its own GPU training and inference

The Next Wave of Deep Learning Architectures was written by Nicole Hemsoth at The Next Platform.

CPU, GPU Put to Deep Learning Framework Test

In the last couple of years, we have examined how deep learning shops are thinking about hardware. From GPU acceleration, to CPU-only approaches, and of course, FPGAs, custom ASICs, and other devices, there are a range of options—but these are still early days. The algorithmic platforms for deep learning are still evolving and it is incumbent on hardware to keep up. Accordingly, we have been seeing more benchmarking efforts of various approaches from the research community.

This week yielded a new benchmark effort comparing various deep learning frameworks on a short list of CPU and

CPU, GPU Put to Deep Learning Framework Test was written by Nicole Hemsoth at The Next Platform.

KiloCore Pushes On-Chip Scale Limits with Killer Core

We have profiled a number of processor updates and novel architectures this week in the wake of the Hot Chips conference this week, many of which have focused on clever FPGA implementations, specialized ASICs, or additions to well-known architectures, including Power and ARM.

Among the presentations that provided yet another way to loop around the Moore’s Law wall is a 1000-core processor “KiloCore” from UC Davis researchers, which they noted during Hot Chips (and the press repeated) was the first to wrap 1000 processors on a single die. Actually, Japanese startup, Exascaler, Inc. beat them to this with the PEZY-SC

KiloCore Pushes On-Chip Scale Limits with Killer Core was written by Nicole Hemsoth at The Next Platform.

Inside the Manycore Research Chip That Could Power Future Clouds

For those interested in novel architectures for large-scale datacenters and complex computing domains, this year has offered plenty of fodder for exploration.

From a rise in custom ASICs to power next generation deep learning, to variations on FPGAs, DSPs, and ARM processor cores, and advancements in low-power processors for webscale datacenters, it is clear that the Moore’s Law death knell is clanging loud enough to spur faster, more voluminous action.

At the Hot Chips conference this week, we analyzed the rollout of a number of new architectures (more on the way as the week unfolds), but one that definitely grabbed

Inside the Manycore Research Chip That Could Power Future Clouds was written by Nicole Hemsoth at The Next Platform.

Baidu Takes FPGA Approach to Accelerating SQL at Scale

While much of the work at Baidu we have focused on this year has centered on the Chinese search giant’s deep learning initiatives, many other critical, albeit less bleeding edge applications present true big data challenges.

As Baidu’s Jian Ouyang detailed this week at the Hot Chips conference, Baidu sits on over an exabyte of data, processes around 100 petabytes per day, updates 10 billion webpages daily, and handles over a petabyte of log updates every 24 hours. These numbers are on par with Google and as one might imagine, it takes a Google-like approach to problem solving at

Baidu Takes FPGA Approach to Accelerating SQL at Scale was written by Nicole Hemsoth at The Next Platform.

FPGA Based Deep Learning Accelerators Take on ASICs

Over the last couple of years, the idea that the most efficient and high performance way to accelerate deep learning training and inference is with a custom ASIC—something designed to fit the specific needs of modern frameworks.

While this idea has racked up major mileage, especially recently with the acquisition of Nervana Systems by Intel (and competitive efforts from Wave Computing and a handful of other deep learning chip startups), yet another startup is challenging the idea that a custom ASIC is the smart, cost-effective path.

The argument is a simple one; deep learning frameworks are not unified, they are

FPGA Based Deep Learning Accelerators Take on ASICs was written by Nicole Hemsoth at The Next Platform.

Specialized Supercomputing Cloud Turns Eye to Machine Learning

Back in 2010, when the term “cloud computing” was still laden with peril and mystery for many users in enterprise and high performance computing, HPC cloud startup, Nimbix, stepped out to tackle that perceived risk for some of the most challenging, latency-sensitive applications.

At the time, there were only a handful of small companies catering to the needs of high performance computing applications and those that existed were developing clever middleware to hook into AWS infrastructure. There were a few companies offering true “HPC as a service” (distinct datacenters designed to fit such workloads that could be accessed via a

Specialized Supercomputing Cloud Turns Eye to Machine Learning was written by Nicole Hemsoth at The Next Platform.

Seven Years Later, SGI Finds a New Ending

Seven years ago, it was the end for SGI. The legendary company had gone bankrupt, its remains were up for liquidation, and its relatively few remaining loyal customers were left in limbo.

This week, SGI reached a new ending, significantly different from its last one, as HPE announced an intended deal to purchase the company for approximately $275 million.

SGI was reincarnated in 2009 when Rackable bought its assets, including its brand, off the scrap heap, for only $42.5 million (originally reported as $25 million at the time, but later updated). Rackable—that is to say, the new SGI—protected employees, key

Seven Years Later, SGI Finds a New Ending was written by Nicole Hemsoth at The Next Platform.

The Cloud Startup that Just Keeps Kicking

Many startups have come and gone since the early days of cloud, but when it comes to those that started small and grown organically with the expansion of use cases, Cycle Computing still stands tall.

Tall being relative, of course. As with that initial slew of cloud startups, a lot of investment money has sloshed around as well. As Cycle Computing CEO, Jason Stowe, reminds The Next Platform, the small team started with an $8,000 credit card bill with sights on the burgeoning needs of scientific computing users in need of spare compute capacity and didn’t take funding until

The Cloud Startup that Just Keeps Kicking was written by Nicole Hemsoth at The Next Platform.

Intel SSF Optimizations Boost Machine Learning

Data scientists and deep and machine learning researchers rely on frameworks and libraries such as Torch, Caffe, TensorFlow, and Theano. Studies by Colfax Research and Kyoto University have found that existing open source packages such as Torch and Theano deliver significantly faster performance through the use of Intel Scalable System Framework (Intel SSF) technologies like the Intel compiler and performance libraries for Intel Math Kernel Library (Intel MKL), Intel MPI (Message Passing Interface), and Intel Threading Building Blocks (Intel TBB), and Intel Distribution for Python (Intel Python).

Andrey Vladimirov (Head of HPC Research, Colfax Research) noted

Intel SSF Optimizations Boost Machine Learning was written by Nicole Hemsoth at The Next Platform.

Intel’s VP of Datacenter Group on “AI—and More—on IA”

Rajeeb Hazra, VP of Intel’s Datacenter Group, is a car buff. Why is that important to HPC? Because autonomous cars are the future, and it will take a phenomenal amount of compute to support them.

Hazra recently shared that some estimates to accurately support 20,000 autonomous cars would require an exaflop of sustained compute. This level of supercomputing is needed, considering the network of millions of sensors inside and outside the cars and their interpretation, plus the deep learning needed to constantly stay aware of the world around them and the drivers inside them, and repeatedly pass new models to

Intel’s VP of Datacenter Group on “AI—and More—on IA” was written by Nicole Hemsoth at The Next Platform.

AWS CTO on How Startups Define Large-Scale Competitiveness

Not long ago, we took a look back at the last decade of Amazon Web Services and its growth, particularly in terms of its reach into high performance computing and large-scale enterprise workloads. While the startup story is easier to tell for AWS in terms of the capex/opex advantage to compete with far larger companies, the enterprise use case growth of AWS is still a stunning story over time.

This morning during his AWS Summit New York keynote, AWS Chief Technology Officer, Werner Vogels shared growth highlights of the company over the last ten years, noting that the message is

AWS CTO on How Startups Define Large-Scale Competitiveness was written by Nicole Hemsoth at The Next Platform.

Nervana CEO on Intel Acquisition, Future Technology Outlook

Following yesterday’s acquisition of deep learning chip startup Nervana Systems by Intel, we talked with the company’s CEO, Naveen Rao, about what plans are for both the forthcoming hardware and internally developed Neon software stack now that the technology is under a much broader umbrella.

Media outlets yesterday reported the acquisition was $350 million, but Rao tells The Next Platform it was not reported correctly and is actually more than that. He was not allowed to state the actual amount but said it was quite a bit higher than the figure given yesterday.

Nervana had been seeking a way to

Nervana CEO on Intel Acquisition, Future Technology Outlook was written by Nicole Hemsoth at The Next Platform.

Delta Datacenter Crash: Do the Math on Disaster Recovery ROI

How on earth could a company the size and scope of Delta—a company whose very business relies on its ability to process, store, and manage fast-changing data—fall prey to a systems-wide outage that brought its business to a grinding halt?

We can look to the official answer, which boils down to a cascading power outage and its far-reaching impacts. But the point here is not about this particular outage; it’s not about Delta either since other major airlines have suffered equally horrendous interruptions to their operations. The real question here is how companies whose mission-critical data can be frozen following

Delta Datacenter Crash: Do the Math on Disaster Recovery ROI was written by Nicole Hemsoth at The Next Platform.

A Corner to Landing Leap: Xeon Phi Generations Put to Test

A first wave of benchmarks and real-world application runs on Intel’s Knights Landing has hit the shores and while not all the codes will be familiar or widely used, the takeaway of significant performance gains between the new generation and its Xeon Phi predecessor are clear.

As we described earlier this summer, the performance projections Intel released about Knights Landing were spot on and now that researchers are getting devices in their hands, these results will be put to further test. And as detailed previously, there are stark differences between Knights Corner and its bigger, badder successor, Knights Landing.

Take

A Corner to Landing Leap: Xeon Phi Generations Put to Test was written by Nicole Hemsoth at The Next Platform.

Deep Learning Chip Upstart Takes GPUs to Task

Bringing a new chip to market is no simple or cheap task, but as a new wave of specialized processors for targeted workloads brings fresh startup tales to bear, we are reminded again how risky such a business can be.

Of course, with high risk comes potential for great reward, that is, if a company is producing a chip that far outpaces general purpose processors for workloads that are high enough in number to validate the cost of design and production. The stand-by figure there is usually stated at around $50 million, but that is assuming a chip requires validation,

Deep Learning Chip Upstart Takes GPUs to Task was written by Nicole Hemsoth at The Next Platform.

The Middle Ground for the Nvidia Tesla K80 GPU

Although the launch of Pascal stole headlines this year on the GPU computing front, the company’s Tesla K80 GPU, which was launched at the end of 2014, has been finding a home across a broader base of applications and forthcoming systems.

A quick look across the supercomputers on the Top 500 list shows that most sites are still using the Tesla K40 accelerator (launched in 2013) in their systems, with several still on the K20 (emerged in 2012). The Comet supercomputer at the San Diego Supercomputer Center (sports 2 K80s across 36 out of 1944 system nodes), an unnamed energy

The Middle Ground for the Nvidia Tesla K80 GPU was written by Nicole Hemsoth at The Next Platform.

A Fresh Look at Gaming Devices for Supercomputing Applications

Over the years there have been numerous efforts to use unconventional, low-power, graphics-heavy processors for traditional supercomputing applications—with varying degrees of success. While this takes some extra footwork on the code side and delivers less performance overall than standard servers, the power is far lower and the cost isn’t even in the same ballpark.

Glenn Volkema and his colleagues at the University of Massachusetts Dartmouth are among some of the most recent researchers putting modern gaming graphics cards to the performance per watt and application benchmark test. In looking at various desktop gaming cards (Nvidia GeForce, AMD Fury X, among

A Fresh Look at Gaming Devices for Supercomputing Applications was written by Nicole Hemsoth at The Next Platform.