Archive

Category Archives for "The Next Platform"

Capital One Machine Learning Lead on Lessons at Scale

Machine learning has moved from prototype to production across a wide range of business units at financial services giant Capital One due in part to a centralized approach to evaluating and rolling out new projects.

This is no easy task given the scale and scope of the enterprise but according to Zachary Hanif who is director of Capitol One’s machine learning “center for excellence”, the trick is to define use cases early that touch as broad of a base within the larger organization as possible and build outwards. This is encapsulated in the philosophy Hanif spearheads—locating machine learning talent in

Capital One Machine Learning Lead on Lessons at Scale was written by Nicole Hemsoth at The Next Platform.

Mounting Complexity Pushes New GPU Profiling Tools

The more things change, the more they remain the same — as do the two most critical issues for successful software execution. First, you remove the bugs, then you profile. And while debugging and profiling are not new, they are needed now more than ever, albeit in a modernized form.

The first performance analysis tools were first found on early IBM platforms in the early 1970s.  These performance profiles were based on timer interrupts that recorded “status words” set at predetermined specific intervals in an attempt to detect “hot spots” inside running code.  

Profiling is even more critical today,

Mounting Complexity Pushes New GPU Profiling Tools was written by James Cuff at The Next Platform.

In Modern Datacenters, The Latency Tail Wags The Network Dog

The expression, the tail wags the dog, is used when a seemingly unimportant factor or infrequent event actually dominates the situation. It turns out that in modern datacenters, this is precisely the case – with relatively rare events determining overall performance.

As the world continues to undergo a digital transformation, one of the most pressing challenges faced by cloud and web service providers is building hyperscale datacenters to handle the growing pace of interactive and real-time requests, generated by the enormous growth of users and mobile apps. With the increasing scale and demand for services, IT organizations have turned

In Modern Datacenters, The Latency Tail Wags The Network Dog was written by Timothy Prickett Morgan at The Next Platform.

The Future of Programming GPU Supercomputers

There are few people as visible in high performance computing programming circles as Michael Wolfe—and fewer still with level of experience. With 20 years working on PGI compilers and another 20 years before that working on languages and HPC compilers in industry, when he talks about the past, present and future of programming supercomputers, it is worthwhile to listen.

In his early days at PGI (formerly known as The Portland Group) Wolfe focused on building out the company’s suite of Fortran, C, and C++ compilers for HPC, a role that changed after Nvidia Tesla GPUs came onto the scene and

The Future of Programming GPU Supercomputers was written by Nicole Hemsoth at The Next Platform.

Google And Its Hyperscale Peers Add Power To The Server Fleet

Six years ago, when Google decided to get involved with the OpenPower consortium being put together by IBM as its third attempt to bolster the use of Power processors in the datacenter, the online services giant had three applications that had over 1 billion users: Gmail, YouTube, and the eponymous search engine that has become the verb for search.

Now, after years of working with Rackspace Hosting on a Power9 server design, Google is putting systems based on IBM’s Power9 processor into production, and not just because it wants pricing leverage with Intel and other chip suppliers. Google now has

Google And Its Hyperscale Peers Add Power To The Server Fleet was written by Timothy Prickett Morgan at The Next Platform.

OpenPower At The Inflection Point

When IBM launched the OpenPower initiative publicly five years ago, to many it seemed like a classic case of too little, too late. But hope springs eternal, particularly with a datacenter sector that is eagerly and actively seeking an alternative to the Xeon processor to curtail the hegemony that Intel has in the glass house.

Perhaps the third time will be the charm. Back in 1991, Apple and IBM and Motorola teamed up to create the AIM Alliance, which sought to create a single unified computing architecture that was suitable for embedded and desktop applications, replacing the Motorola 68000 processors

OpenPower At The Inflection Point was written by Timothy Prickett Morgan at The Next Platform.

There is No Such Thing as Easy AI — But We’re Getting Closer

The dark and mysterious art of artificial intelligence and machine learning is neither straightforward, or easy. AI systems have been termed “black boxes” for this reason for decades now. We desperately continue to present ever larger, more unwieldy datasets to increasingly sophisticated “mystery algorithms” in our attempts to rapidly infer and garner new knowledge.  

How can we try to make all of this just a little easier?

Hyperscalers with multi-million dollar analytics teams have access to vast, effectively unlimited compute and storage of all shapes and sizes. Huge teams of analysts, systems managers, resilience and reliability experts are standing up

There is No Such Thing as Easy AI — But We’re Getting Closer was written by James Cuff at The Next Platform.

Turbulence – And Opportunity – Ahead In The Oracle Sparc Base

You can’t swing a good-sized cat without hitting an enterprise running Oracle software in some shape or form. If it’s not Oracle’s ubiquitous database, then it’s one of its middleware platforms or its enterprise applications in the Fusion suite or its predecessors in the Oracle, Siebel, PeopleSoft, and JD Edwards suites.

Currently Oracle boasts 430,000 customers running its software – that’s quite an installed base. And it’s all teed up to become quite a battleground. Why?

Six months or so ago, news broke that Oracle was laying off a large number of hardware folks. Something like 2,500 Sparc and Solaris

Turbulence – And Opportunity – Ahead In The Oracle Sparc Base was written by Timothy Prickett Morgan at The Next Platform.

HPE Aims Apollo at Enterprise AI

There continues to be an ongoing push among tech vendors to bring artificial intelligence (AI) and its various components – including deep learning and machine learning – to the enterprise. The technologies are being rapidly adopted by hyperscalers and in the HPC space, and enterprises stand to reap significant benefits by also embracing them.

As we’ve noted many times here at The Next Platform, at the most basic level, machine learning and deep learning can enable enterprises to quickly sort through and analyze the massive amounts of data that they’re collecting to find patterns that can lead to better

HPE Aims Apollo at Enterprise AI was written by Jeffrey Burt at The Next Platform.

What’s Ahead for Supercomputing’s Balanced Benchmark

We all know about the Top 500 supercomputing benchmark, which measures raw floating point performance. But over the several years there has been talk that this no longer represents real-world application performance.

This has opened the door for a new benchmark to come to the fore, in this case the high performance conjugate gradients benchmark, or HPCG, benchmark.

Here to talk about this on today’s episode of “The Interview” with The Next Platform is one of the creators of HPCG, Sandia National Lab’s Dr. Michael Heroux. Interestingly, Heroux co-developed HPCG with one of the founders of the Top

What’s Ahead for Supercomputing’s Balanced Benchmark was written by Nicole Hemsoth at The Next Platform.

Dell EMC Puts Open Networking on the Edge

Computing resources – including storage and networking – are continuing their march toward the network edge, drawn like a magnet to the rapidly proliferating connected devices in the world and the huge amounts of data that they’re generating that need to be collected, processed and analyzed.

As we’ve talked about here at The Next Platform over the past few months, the distributed nature of computing, fueled by such drivers as the cloud, the Internet of Things (IoT) and greater mobility, and the demand for capabilities like artificial intelligence (AI), machine learning and analytics to manage the data call for moving

Dell EMC Puts Open Networking on the Edge was written by Jeffrey Burt at The Next Platform.

A Reference Architecture for NVMe over Fabrics

Cavium has raised its profile over the past several years as one of the pioneers in developing Arm-based systems-on-a-chip (SoCs) for servers, rolling out multiple generations of its ThunderX chips in hope of pushing Arm’s low-power architecture make gains in a datacenter environment that for years has been dominated by Intel and its x86-based Xeons.

However, like similar chip makers, Cavium didn’t start with the Arm server chips, but instead built to that point atop a broad array of products for other areas of the datacenter, including adapters, controllers, switches and MIPS-based processors for networking and storage devices.

A Reference Architecture for NVMe over Fabrics was written by Jeffrey Burt at The Next Platform.

Singularity Containers for HPC & Deep Learning

Containerization as a concept of isolating application processes while sharing the same operating system (OS) kernel has been around since the beginning of this century. It started its journey from as early as Jails from the FreeBSD era. Jails heavily leveraged the chroot environment but expanded capabilities to include a virtualized path to other system attributes such as storage, interconnects and users. Solaris Zones and AIX Workload Partitions also fall into a similar category.

Since then, the advent and advancement in technologies such as cgroups, systemd and user-namespaces greatly improved the security and isolation of containers when compared to their

Singularity Containers for HPC & Deep Learning was written by Nicole Hemsoth at The Next Platform.

Argonne Hints at Future Architecture of Aurora Exascale System

There are two supercomputers named “Aurora” that are affiliated with Argonne National Laboratory – the one that was supposed to be built this year and the one that for a short time last year was known as “A21,” that will be built in 2021, and that will be the first exascale system built in the United States.

Details have just emerged on the second, and now only important, Aurora system, thanks to Argonne opening up proposals for the early science program that lets researchers put code on the supercomputer for three months before it starts its production work. The proposal

Argonne Hints at Future Architecture of Aurora Exascale System was written by Timothy Prickett Morgan at The Next Platform.

FPGA Maker Xilinx Says the Future of Computing is ACAP

The field programmable gate space is heating up with new use cases driven by everything from emerging network, IoT, and application acceleration trends. Keeping ahead of the curve means expanding on devices that have quite steady improvement cycles, which means the few companies at the top need to get creative to stay competitive.

Xilinx and Altera – which was bought by Intel in 2015 for $16.7 billion – have been the top vendors of FPGAs, which can be programmed and reprogrammed, enabling organizations the ability to adapt the processors to the varying workloads running on the systems. The high price

FPGA Maker Xilinx Says the Future of Computing is ACAP was written by Jeffrey Burt at The Next Platform.

How Spectre And Meltdown Mitigation Hits Xeon Performance

It has been more than two months since Google revealed its research on the Spectre and Meltdown speculative execution security vulnerabilities in modern processors, and caused the whole IT industry to slam on the brakes and brace for the impact. The initial microbenchmark results on the mitigations for these security holes, put out by Red Hat, showed the impact could be quite dramatic. But according to recent tests done by Intel, the impact is not as bad as one might think in many cases. In other cases, the impact is quite severe.

The Next Platform has gotten its hands on

How Spectre And Meltdown Mitigation Hits Xeon Performance was written by Timothy Prickett Morgan at The Next Platform.

IBM Unwinds Tangled Data for Enterprise AI

These days, organizations are creating and storing massive amounts of data, and in theory this data can be used to drive business decisions through application development, particularly with new techniques such as machine learning. Data is arguably the most important asset, and it is also probably the most difficult thing to manage. Well, excepting people.

Data is tangled mess. It can be structured or unstructured, and it is increasingly scattered in different locations – in on-premises infrastructure, in a public cloud, on a mobile device. It is a challenge to move, thanks to the costs in everything from bandwidth to

IBM Unwinds Tangled Data for Enterprise AI was written by Jeffrey Burt at The Next Platform.

Getting AI Leverage With GPU-Optimized Systems

The artificial intelligence revolution is quickly changing every industry, and modern data centers must be equipped to capitalize on these extraordinary new capabilities. Hewlett Packard Enterprise (HPE) and Nvidia are partnering to bring best-of-breed AI solutions to every customer, offering AI-integrated systems, services, and support capabilities to help all organizations seamlessly optimize their AI foundation, deliver differentiated outcomes, and gain competitive advantage.

High performance computing has become key to solving many of the world’s grand challenges in the realms of science, industry, and engineering. However, traditional CPUs are increasingly failing to deliver the performance gains they used to, and the

Getting AI Leverage With GPU-Optimized Systems was written by Timothy Prickett Morgan at The Next Platform.

Practical Computational Balance: Contending with Unplanned Data

In part one of our series on reaching computational balance, we described how computational complexity is increasing logarithmically. Unfortunately, data and storage follows an identical trend.

The challenge of balancing compute and data at scale remains constant. Because providers and consumers don’t have access to “the crystal ball of demand prediction”, the appropriate computational response to vast, unpredictable amounts of highly variable complex data becomes unintentionally unplanned.

We must address computational balance in a world barraged by vast and unplanned data.

Before starting any discussion of data balance, it is important to first remind ourselves of scale.  Small

Practical Computational Balance: Contending with Unplanned Data was written by James Cuff at The Next Platform.

Using Python to Snake Closer to Simplified Deep Learning

On today’s episode of “The Interview” with The Next Platform, we discuss the role of higher level interfaces to common machine learning and deep learning frameworks, including Caffe.

Despite the existence of multiple deep learning frameworks, there is a lack of comprehensible and easy-to-use high-level tools for the design, training, and testing of deep neural networks (DNNs) according to this episode’s guest, Soren Klemm, one of the creators of Python based Barista, which is an open-source graphical high-level interface for the Caffe framework.

While Caffe is one of the most popular frameworks for training DNNs, editing prototxt files in

Using Python to Snake Closer to Simplified Deep Learning was written by Nicole Hemsoth at The Next Platform.