Nicole Hemsoth

Author Archives: Nicole Hemsoth

The (Second) Coming of Composable Systems

The concept of composable or disaggregated infrastructure is nothing new, but with approaching advances in technology on both the software and network sides (photonics in particular) an old idea might be infused with new life.

Several vendors have already taken a disaggregated architecture approach at the storage, network, and system level. Cisco Systems’ now defunct UCS M Series, for instance, is one example, and one can consider Hewlett Packard Enterprise’s The Machine as one contemporary example and its Project Synergy as two others; DriveScale, which we covered back in May, is possibly another. But thus far, none of

The (Second) Coming of Composable Systems was written by Nicole Hemsoth at The Next Platform.

The Inevitability Of Private Public Clouds

Call it a phase that companies will have to go through to get to the promised land of the public cloud. Call it a temporary inevitability, as Microsoft does. Call it stupid if you want to be ungenerous to IT shops used to controlling their own infrastructure. Call it what you will, but it sure does look like all of the big public clouds are going to have to figure out how to offer private cloud versions of their public cloud infrastructure, and Amazon Web Services be no exception if it hopes to capture dominant market share as it

The Inevitability Of Private Public Clouds was written by Nicole Hemsoth at The Next Platform.

Changing the Exascale Efficiency Narrative at Memory Start Point

With this summer’s announcement of China’s dramatic shattering of top supercomputing performance numbers using ten million relatively simple cores, there is a slight shift in how some are considering the future of the world’s fastest, largest systems.

While one approach, which will be seeing with the pre-exascale machines at the national labs in the United States, is to build complex systems based on sophisticated cores (with a focus on balance in terms of memory) the Chinese approach with the top Sunway TaihuLight machine, which is based on lighter weight, simple, and cheap components and using those in volume, has

Changing the Exascale Efficiency Narrative at Memory Start Point was written by Nicole Hemsoth at The Next Platform.

MIT Research Pushes Latency Limits with Distributed Flash

We are hitting the limits of what can be crammed into DRAM in a number of application areas. As data volumes continue to mount, this limitation will be more keenly felt.

Accordingly, there has been a great deal of work recently to look to flash to create more efficient and capable system that can accelerate deeply data-intensive problems, but few things have gotten enough traction to filter their way into big news items. With that said, there are some potential breakthroughs on this front coming out of MIT where some rather impressive performance improvements have been snagged by taking a

MIT Research Pushes Latency Limits with Distributed Flash was written by Nicole Hemsoth at The Next Platform.

Are ARM Virtualization Woes Overstated?

As we have seen with gathering force, ARM is making a solid bid for datacenters of the future. However, a key feature of many serer farms that will be looking exploit the energy efficiency benefits of 64-bit ARM is the ability to maintain performance in a virtualized environment.

Neither X86 or ARM were built with virtualization in mind, which meant an uphill battle for Intel to build hardware support for hypervisors into its chips. VMware led the charge here beginning in the late 1990s, and over time, Intel made it its business to ensure an ability to support several different

Are ARM Virtualization Woes Overstated? was written by Nicole Hemsoth at The Next Platform.

IBM Quantum Computing Push Gathering Steam

For the first time access to cutting-edge quantum computing is open free to the public over the web. On 3 May 2016, IBM launched their IBM Quantum Experience website, which enthusiasts and professionals alike can program on a prototype quantum processor chip within a simulation environment. Users, accepted over email by IBM, are given a straightforward ‘composer’ interface, much like a musical note chart, to run a program and test the output. In over a month more than 25,000 users have signed up.

The quantum chip itself combines five superconducting quantum bits (qubits) operating at a cool minus 273.135531 degrees

IBM Quantum Computing Push Gathering Steam was written by Nicole Hemsoth at The Next Platform.

MPI and Scalable Distributed Machine Learning

MPI (Message Passing Interface) is the de facto standard distributed communications framework for scientific and commercial parallel distributed computing. The Intel MPI implementation is a core technology in the Intel Scalable System Framework that provides programmers a “drop-in” MPICH replacement library that can deliver the performance benefits of the Intel Omni-Path Architecture (Intel OPA ) communications fabric plus high core count Intel Xeon and Intel Xeon Phi processors.

“Drop-in” literally means that programmers can set an environmental variable to dynamically load the highly tuned and optimized Intel MPI library – no recompilation required! Of course, Intel’s MPI library supports other

MPI and Scalable Distributed Machine Learning was written by Nicole Hemsoth at The Next Platform.

Optimization Tests Confirm Knights Landing Performance Projections

Close to a year ago when more information was becoming available about the Knights Landing processor, Intel released projections for its relative performance against two-socket Haswell machines. As one might image, the performance improvements were impressive, but now that there are systems on the ground that can be optimized and benchmarked, we are finally getting a more boots-on-the-ground view into the performance bump.

As it turns out, optimization and benchmarking on the “Cori” supercomputer at NERSC are showing that those figures were right on target. In a conversation with one of the co-authors of a new report highlighting the optimization

Optimization Tests Confirm Knights Landing Performance Projections was written by Nicole Hemsoth at The Next Platform.

Startup Takes a Risk on RISC-V Custom Silicon

As we are carefully watching here, there is a perfect storm brewing in the semiconductor space, both for manufacturers and system designers.

On the one hand, the impending demise of Moore’s Law presents a set of challenges—and opportunities—for emerging chip companies to arise and offer alternatives, often with customization cooked into the business model. And for end users, there is a rising tide of options that might lift a lot of boats if ecosystems are rapidly adopted. This is the case in the ARM space, as we’ve seen clearly this year, as well as for other architectures, including efforts from

Startup Takes a Risk on RISC-V Custom Silicon was written by Nicole Hemsoth at The Next Platform.

Supercomputing’s Scramble to Keep Thinking in Parallel

As supercomputing centers look to future exascale systems, among the other pressing concerns (power consumption in particular) is adopting the right programming approach to scale applications across millions of cores.

And while this might sound like a big enough challenge on its own, it gets more complicated because it might just be that a new programming model (or system) might not be the scalability and performance answer either. It could just be that tweaking existing tools and methods can move programming evolution to programming revolution, that is, of course, if the supercomputing programmer community can agree.

Like all things in

Supercomputing’s Scramble to Keep Thinking in Parallel was written by Nicole Hemsoth at The Next Platform.

OpenPower Developers Primed for Big Wins at IBM Hackathon

IBM has created a virtual hackathon for all you lovely developers to test drive your data-intensive applications on the OpenPOWER server, GPU and accelerator platform. And there’s $27,000 worth of prizes on the table. Want to give it a go? Check out the competition rules and register for the OpenPOWER Developer Challenge.

The closing deadline is September 1 and already 277 individuals have signed up. So don’t dilly dally: tear down those hardware performance barriers and submit your entry. Choose which track is the one for you and connect with the experts ‘round the clock on Slack to get

OpenPower Developers Primed for Big Wins at IBM Hackathon was written by Nicole Hemsoth at The Next Platform.

In Situ Analysis to Push Supercomputing Efficiency

As supercomputers expand in terms of processing, storage, and network capabilities, the size and scope of simulations is also expanding outward. While this is great news for scientific progress, this naturally creates some new bottlenecks, particularly on the analysis and visualization fronts.

Historically, most large-scale simulations would dump time step and other data at defined intervals onto disk for post-processing and visualization, but as the petabyte scale of that process adds more weight, that is becoming less practical. Further, for those who know what they want to find in that data, using an in situ approach to finding the answer

In Situ Analysis to Push Supercomputing Efficiency was written by Nicole Hemsoth at The Next Platform.

Inside Look at Key Applications on China’s New Top Supercomputer

As the world is now aware, China is now home to the world’s most powerful supercomputer, toppling the previous reigning system, Tianhe-2, which is also located in the country.

In the wake of the news, we took an in-depth look at the architecture of the new Sunway TiahuLight machine, which will be useful background as we examine a few of the practical applications that have been ported to and are now running on the 10 million-core, 125 petaflop-capable supercomputer.

The sheer size and scale of the system is what initially grabbed headlines when we broke news about the system last

Inside Look at Key Applications on China’s New Top Supercomputer was written by Nicole Hemsoth at The Next Platform.

System Software, Orchestration Gets an OpenHPC Boost

System software setup and maintenance has become a major efficiency drag on HPC labs and OEMs alike, but community and industry efforts are now underway to reduce the huge amounts of duplicated development, validation and maintenance work across the HPC ecosystem. Disparate efforts and approaches, while necessary on some levels, slow adoption of hardware innovation and progress toward exascale performance. They also complicate adoption of complex workloads like big data and machine learning.

With the creation of the OpenHPC Community, a Linux Foundation collaborative project, the push is on to minimize duplicated efforts in the HPC software stack wherever

System Software, Orchestration Gets an OpenHPC Boost was written by Nicole Hemsoth at The Next Platform.

Emerging “Universal” FPGA, GPU Platform for Deep Learning

In the last couple of years, we have written and heard about the usefulness of GPUs for deep learning training as well as, to a lesser extent, custom ASICs and FPGAs. All of these options have shown performance or efficiency advantages over commodity CPU-only approaches, but programming for all of these is often a challenge.

Programmability hurdles aside, deep learning training on accelerators is standard, but is often limited to a single choice—GPUs or, to a far lesser extent, FPGAs. Now, a research team from the University of California Santa Barbara has proposed a new middleware platform that can combine

Emerging “Universal” FPGA, GPU Platform for Deep Learning was written by Nicole Hemsoth at The Next Platform.

Novel Architectures on the Far Horizon for Weather Prediction

Weather modeling and forecasting centers are among some of the top users of supercomputing systems and are at the top of the list when it comes to areas that could benefit from exascale-class compute power.

However, for modeling centers, even those with the most powerful machines, there is a great deal of leg work on the code front in particular to scale to that potential. Still, many, including most recently the UK Met Office, have planted a stake in the ground for exascale—and they are looking beyond traditional architectures to meet the power and scalability demands they’ll be facing

Novel Architectures on the Far Horizon for Weather Prediction was written by Nicole Hemsoth at The Next Platform.

Mitigating MPI Message Matching Issues

Since the 1990s, MPI (Message Passing Interface) has been the dominant communications protocol for high-performance scientific and commercial distributed computing. Designed in an era when processors with two or four cores were considered high-end parallel devices, the recent move to processors containing tens to a few hundred cores (as exemplified by the current Intel Xeon and Intel Xeon Phi processor families) has exacerbated scaling issues inside MPI itself. Increased network traffic, amplified by high performance communications fabrics such as InfiniBand and Intel Omni-Path Architecture (Intel OPA) manifest an MPI performance and scaling issue.

In recognition of their outstanding research and

Mitigating MPI Message Matching Issues was written by Nicole Hemsoth at The Next Platform.

HPC is Great for AI, But What Does Supercomputing Stand to Gain?

As we have written about extensively here at The Next Platform, there is no shortage of use cases in deep learning and machine learning where HPC hardware and software approaches have bled over to power next generation applications in image, speech, video, and other classification and learning tasks.

Since we focus on high performance computing systems here in their many forms, that trend has been exciting to follow, particularly watching GPU computing and matrix math-based workloads find a home outside of the traditional scientific supercomputing center.

This widened attention has been good for HPC as well since it has

HPC is Great for AI, But What Does Supercomputing Stand to Gain? was written by Nicole Hemsoth at The Next Platform.

Measuring Top Supercomputer Performance in the Real World

When we cover the bi-annual listing of the world’s most powerful supercomputers, the metric at the heart of those results, the high performance Linpack benchmark, the gold standard for over two decades, is the basis. However, many have argued the benchmark is getting long in tooth with its myopic focus on sheer floating point performance over other important factors that determine a supercomputer’s value for real-world applications.

This shift in value stands to reason, since larger machines mean more data coursing through the system, thus an increased reliance on memory and the I/O subsystem, among other factors. While raw floating

Measuring Top Supercomputer Performance in the Real World was written by Nicole Hemsoth at The Next Platform.

Knights Landing Proves Solid Ground for Intel’s Stake in Deep Learning

Intel has finally opened the first public discussions of its investment in the future of machine learning and deep learning and while some might argue it is a bit late in the game with its rivals dominating the training market for such workloads, the company had to wait for the official rollout of Knights Landing and extensions to the scalable system framework to make it official—and meaty enough to capture real share from the few players doing deep learning at scale.

Yesterday, we detailed the announcement of the first volume shipments of Knights Landing, which already is finding a home

Knights Landing Proves Solid Ground for Intel’s Stake in Deep Learning was written by Nicole Hemsoth at The Next Platform.