Archive

Category Archives for "The Next Platform"

Facebook Pushes The Search Envelope With GPUs

An increasing amount of the world’s data is encapsulated in images and video and by its very nature it is difficult and extremely compute intensive to do any kind of index and search against this data compared to the relative ease with which we can do so with the textual information that heretofore has dominated both our corporate and consumer lives.

Initially, we had to index images by hand and it is with these datasets that the hyperscalers pushed the envelope with their image recognition algorithms, evolving neural network software on CPUs and radically improving it with a jump to

Facebook Pushes The Search Envelope With GPUs was written by Timothy Prickett Morgan at The Next Platform.

China Making Swift, Competitive Quantum Computing Gains

Chinese officials have made no secret out of their desire to become the world’s dominant player in the technology industry. As we’ve written about before at The Next Platform, China has accelerated its investments in IT R&D over the past several years, spending tens of billions of dollars to rapidly expand the capabilities of its own technology companies to better compete with their American counterparts, while at the same time forcing U.S. tech vendors to clear various hurdles in their efforts to access the fast-growing China market.

This is being driven by a combination of China’s desire to increase

China Making Swift, Competitive Quantum Computing Gains was written by Nicole Hemsoth at The Next Platform.

Rapid GPU Evolution at Chinese Web Giant Tencent

Like other major hyperscale web companies, China’s Tencent, which operates a massive network of ad, social, business, and media platforms, is increasingly reliant on two trends to keep pace.

The first is not surprising—efficient, scalable cloud computing to serve internal and user demand. The second is more recent and includes a wide breadth of deep learning applications, including the company’s own internally developed Mariana platform, which powers many user-facing services.

When the company introduced its deep learning platform back in 2014 (at a time when companies like Baidu, Google, and others were expanding their GPU counts for speech and

Rapid GPU Evolution at Chinese Web Giant Tencent was written by Nicole Hemsoth at The Next Platform.

Squeezing The Joules Out Of DRAM, Possibly Without Stacking

Increasing parallelism is the only way to get more work out of a system. Architecting for that parallelism required requires a lot of rethinking of each and every component in a system to make everything hum along as efficiently as possible.

There are lots of ways to skin the parallelism cats and squeeze more performance and less energy out of the system, and for DRAM memory, just stacking things up helps, but according to some research done at Stanford University, the University of Texas, and GPU maker Nvidia, there is another way to boost performance and lower energy consumption. The

Squeezing The Joules Out Of DRAM, Possibly Without Stacking was written by Timothy Prickett Morgan at The Next Platform.

Fujitsu Looks to 3D ICs, Silicon Photonics to Drive Future Systems

The rise of public and private clouds, the growth of the Internet of Things, the proliferation of mobile devices and the massive amounts of data that need to be collected, stored, moved and analyzed that are being generated by such fast-growing emerging trends promise to drive significant changes in both software and hardware development in the coming years.

Depending on who you’re talking to, there could be anywhere from 10 billion to 25 billion connected devices worldwide, self-driving cars are expected to rapidly grow in use in the next decade and corporate data is no longer housed primarily in stationary

Fujitsu Looks to 3D ICs, Silicon Photonics to Drive Future Systems was written by Nicole Hemsoth at The Next Platform.

KAUST Hackathon Shows OpenACC Global Appeal

OpenACC’s global attraction can be seen in the recent February 2017 OpenACC mini-hackathon and GPU conference at KAUST (King Abdullah University of Science & Technology) in Saudi Arabia. OpenACC was created so programmers can insert pragmas to provide information to the compiler about parallelization opportunities and data movement operations to and from accelerators. Programmers use pragmas to work in concert with the compiler to create, tune and optimize parallel codes to achieve high performance.

Demand was so high to attend this mini-hackathon that the organizers had to scramble to find space for ten teams, even though the hackathon was originally

KAUST Hackathon Shows OpenACC Global Appeal was written by Nicole Hemsoth at The Next Platform.

Roadblocks, Fast Lanes for China’s Enterprise IT Spending

The 13th Five Year Plan and other programs to bolster cloud adoption among Chinese businesses like the Internet Plus effort have lit a fire under China’s tech and industrial sectors to modernize IT operations.

However, the growth of China’s cloud and overall enterprise IT market is far slower than in other nations. while there is a robust hardware business in the country, the traditional view of enterprise-class software is still sinking in, leaving a gap between hardware and software spending. Further, the areas that truly drive tech spending, including CPUs and enterprise software and services, are the key areas

Roadblocks, Fast Lanes for China’s Enterprise IT Spending was written by Nicole Hemsoth at The Next Platform.

Memory And Logic In A Post Moore’s Law World

The future of Moore’s Law has become a topic of hot debate in recent years, as the challenge of continually shrinking transistors and other components has grown.

Intel, AMD, IBM, and others continue to drive the development of smaller electronic components as a way of ensuring advancements in compute performance while driving down the cost of that compute. Processors from Intel and others are moving now from 14 nanometer processes down to 10 nanometers, with plans to continue onto 7 nanometers and smaller.

For more than a decade, Intel had relied on a tick-tock manufacturing schedule to keep up with

Memory And Logic In A Post Moore’s Law World was written by Jeffrey Burt at The Next Platform.

Upstart Switch Chip Maker Tears Up The Ethernet Roadmap

Ethernet switching has its own kinds of Moore’s Law barriers. The transition from 10 Gb/sec to 100 Gb/sec devices over the past decade has been anything but smooth, and a lot of compromises had to be made to even get to the interim – and unusual – 40 Gb/sec stepping stone towards the 100 Gb/sec devices that are ramping today in the datacenter.

While 10 Gb/sec Ethernet switching is fine for a certain class of enterprise applications that are not bandwidth hungry, for the hyperscalers and cloud builders, 100 Gb/sec is nowhere near enough bandwidth, and 200 Gb/sec, which is

Upstart Switch Chip Maker Tears Up The Ethernet Roadmap was written by Timothy Prickett Morgan at The Next Platform.

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Learning?

Continued exponential growth of digital data of images, videos, and speech from sources such as social media and the internet-of-things is driving the need for analytics to make that data understandable and actionable.

Data analytics often rely on machine learning (ML) algorithms. Among ML algorithms, deep convolutional neural networks (DNNs) offer state-of-the-art accuracies for important image classification tasks and are becoming widely adopted.

At the recent International Symposium on Field Programmable Gate Arrays (ISFPGA), Dr. Eriko Nurvitadhi from Intel Accelerator Architecture Lab (AAL), presented research on Can FPGAs beat GPUs in Accelerating Next-Generation Deep Neural Networks. Their research

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Learning? was written by Nicole Hemsoth at The Next Platform.

New ARM Architecture Offers A DynamIQ Response To Compute

ARM has rearchitected its multi-core chips to they can better compete in a world where computing needs are becoming more specialized.

The new DynamIQ architecture will provide flexible compute, with up to eight different cores in a single cluster on a system on a chip. Each core can run at a different clock speed so a company making an ARM SoC can tailor the silicon to handle multiple workloads at varying power efficiencies. The DynamIQ architecture also adds faster access to accelerators for artificial intelligence or networking jobs, and a resiliency that allows it to be used in robotics, autonomous

New ARM Architecture Offers A DynamIQ Response To Compute was written by Timothy Prickett Morgan at The Next Platform.

Like Flash, 3D XPoint Enters The Datacenter As Cache

In the datacenter, flash memory took off first as a caching layer between processors and their cache memories and main memory and the ridiculously slow disk drives that hang off the PCI-Express bus on the systems. It wasn’t until the price of flash came way down and the capacities of flash card and drives came down that companies could think about going completely to flash for some, much less all of their workloads.

So it will be with Intel’s Optane 3D XPoint non-volatile memory, which Intel is starting to roll out in its initial datacenter-class SSDs and will eventually deliver

Like Flash, 3D XPoint Enters The Datacenter As Cache was written by Timothy Prickett Morgan at The Next Platform.

Keeping The Blue Waters Supercomputer Busy For Three Years

After years of planning and delays after a massive architectural change, the Blue Waters supercomputer at the National Center for Supercomputing Applications at the University of Illinois finally went into production in 2013, giving scientists, engineers and researchers across the country a powerful tool to run and solve the most complex and challenging applications in a broad range of scientific areas, from astrophysics and neuroscience to biophysics and molecular research.

Users of the petascale system have been able to simulate the evolution of space, determine the chemical structure of diseases, model weather, and trace how virus infections propagate via air

Keeping The Blue Waters Supercomputer Busy For Three Years was written by Jeffrey Burt at The Next Platform.

Google Team Refines GPU Powered Neural Machine Translation

Despite the fact that Google has developed its own custom machine learning chips, the company is well-known as a user of GPUs internally, particularly for its deep learning efforts, in addition to offering GPUs in its cloud.

At last year’s Nvidia GPU Technology Conference, Jeff Dean, Senior Google Fellow offered a vivid description of how the search giant has deployed GPUs for a large number of workloads, many centered around speech recognition and language-oriented research projects as well as various computer vision efforts. What was clear from Dean’s talk—and from watching other deep learning shops with large GPU cluster

Google Team Refines GPU Powered Neural Machine Translation was written by Nicole Hemsoth at The Next Platform.

Increasing HPC Utilization with Meta-Queues

Solving problems by the addition of abstractions is a tried and true approach in technology. The management of high-performance computing workflows is no exception.

The Pegasus workflow engine and HTCondor’s DAGman are used to manage workflow dependencies. GridWay and DRIVE route jobs to different resources based on suitability or available capacity. Both of these approaches are important, but they share a key potential drawback: jobs are still treated as distinct units of computation to be scheduled individually by the scheduler.

As we have written previously, the aims of HPC resource administrators and HPC resource users are sometimes at odds.

Increasing HPC Utilization with Meta-Queues was written by Nicole Hemsoth at The Next Platform.

Open Hardware Pushes GPU Computing Envelope

The hyperscalers of the world are increasingly dependent on machine learning algorithms for providing a significant part of the user experience and operations of their massive applications, so it is not much of a surprise that they are also pushing the envelope on machine learning frameworks and systems that are used to deploy those frameworks. Facebook and Microsoft were showing off their latest hybrid CPU-GPU designs at the Open Compute Summit, and they provide some insight into how to best leverage Nvidia’s latest “Pascal” Tesla accelerators.

Not coincidentally, the specialized systems that have been created for supporting machine learning workloads

Open Hardware Pushes GPU Computing Envelope was written by Timothy Prickett Morgan at The Next Platform.

Chinese Researchers One Step Closer to Parallel Turing Machine

Parallel computing has become a bedrock in the HPC field, where applications are becoming increasingly complex and such compute-intensive technologies as data analytics, deep learning and artificial intelligence (AI) are rapidly emerging. Nvidia and AMD have driven the adoption of GPU accelerators in supercomputers and other high-end systems, Intel is addressing the space with its many-core Xeon Phi processors and coprocessors and, as we’ve talked about at The Next Platform, other acceleration technologies like field-programmable gate arrays (FPGAs) are pushing their way into the picture. Parallel computing is a booming field.

However, the future was not always so assured.

Chinese Researchers One Step Closer to Parallel Turing Machine was written by Nicole Hemsoth at The Next Platform.

Serving Up Serverless Science

The “serverless” trend has become the new hot topic in cloud computing. Instead of running Infrastructure-as-a-Service (IaaS) instances to provide a service, individual functions are executed on demand.

This has been a boon to the web development world, as it allows the creation of UI-driven workloads without the administrative overhead of provisioning, configuring, monitoring, and maintaining servers. Of course, the industry has not yet reached the point where computation can be done in thin air, so there are still servers involved somewhere. The point is that the customer is not concerned with mundane tasks such as operating system patching and

Serving Up Serverless Science was written by Nicole Hemsoth at The Next Platform.

Peering Through Opaque HPC Benchmarks

If Xzibit worked in the HPC field, he might be heard to say “I heard you like computers, so we modeled a computer with your computer so you can simulate your simulations.”

But simulating the performance of HPC applications is more than just recursion for comedic effect, it provides a key mechanism for the study and prediction of application behavior under different scenarios. While actually running the code on the system will yield a measure of the wallclock time, it does little to provide an explanation of what factors impacted that wallclock time. And of course it requires the system

Peering Through Opaque HPC Benchmarks was written by Nicole Hemsoth at The Next Platform.

Cineca’s HPC Systems Tackle Italy’s Biggest Computing Challenges

With over 700 employees, Cineca is Italy’s largest and most advanced high performance computing (HPC) center, channeling their systems expertise to benefit organizations across the nation. Comprised of six Italian research institutions, 70 Italian universities, and the Italian Ministry of Education, Cineca is a privately held, non-profit consortium.

The team at Cineca dedicates itself to tackling the greatest computational challenges faced by public and private companies, and research institutions.  With so many organizations depending on Italy’s HPC centers, Cineca relies on Intel® technologies to reliably and efficiently further the country’s innovations in scientific computing, web and networking-based services, big data

Cineca’s HPC Systems Tackle Italy’s Biggest Computing Challenges was written by Timothy Prickett Morgan at The Next Platform.