OpenACC’s global attraction can be seen in the recent February 2017 OpenACC mini-hackathon and GPU conference at KAUST (King Abdullah University of Science & Technology) in Saudi Arabia. OpenACC was created so programmers can insert pragmas to provide information to the compiler about parallelization opportunities and data movement operations to and from accelerators. Programmers use pragmas to work in concert with the compiler to create, tune and optimize parallel codes to achieve high performance.
Demand was so high to attend this mini-hackathon that the organizers had to scramble to find space for ten teams, even though the hackathon was originally …
KAUST Hackathon Shows OpenACC Global Appeal was written by Nicole Hemsoth at The Next Platform.
The 13th Five Year Plan and other programs to bolster cloud adoption among Chinese businesses like the Internet Plus effort have lit a fire under China’s tech and industrial sectors to modernize IT operations.
However, the growth of China’s cloud and overall enterprise IT market is far slower than in other nations. while there is a robust hardware business in the country, the traditional view of enterprise-class software is still sinking in, leaving a gap between hardware and software spending. Further, the areas that truly drive tech spending, including CPUs and enterprise software and services, are the key areas …
Roadblocks, Fast Lanes for China’s Enterprise IT Spending was written by Nicole Hemsoth at The Next Platform.
The future of Moore’s Law has become a topic of hot debate in recent years, as the challenge of continually shrinking transistors and other components has grown.
Intel, AMD, IBM, and others continue to drive the development of smaller electronic components as a way of ensuring advancements in compute performance while driving down the cost of that compute. Processors from Intel and others are moving now from 14 nanometer processes down to 10 nanometers, with plans to continue onto 7 nanometers and smaller.
For more than a decade, Intel had relied on a tick-tock manufacturing schedule to keep up with …
Memory And Logic In A Post Moore’s Law World was written by Jeffrey Burt at The Next Platform.
Ethernet switching has its own kinds of Moore’s Law barriers. The transition from 10 Gb/sec to 100 Gb/sec devices over the past decade has been anything but smooth, and a lot of compromises had to be made to even get to the interim – and unusual – 40 Gb/sec stepping stone towards the 100 Gb/sec devices that are ramping today in the datacenter.
While 10 Gb/sec Ethernet switching is fine for a certain class of enterprise applications that are not bandwidth hungry, for the hyperscalers and cloud builders, 100 Gb/sec is nowhere near enough bandwidth, and 200 Gb/sec, which is …
Upstart Switch Chip Maker Tears Up The Ethernet Roadmap was written by Timothy Prickett Morgan at The Next Platform.
Continued exponential growth of digital data of images, videos, and speech from sources such as social media and the internet-of-things is driving the need for analytics to make that data understandable and actionable.
Data analytics often rely on machine learning (ML) algorithms. Among ML algorithms, deep convolutional neural networks (DNNs) offer state-of-the-art accuracies for important image classification tasks and are becoming widely adopted.
At the recent International Symposium on Field Programmable Gate Arrays (ISFPGA), Dr. Eriko Nurvitadhi from Intel Accelerator Architecture Lab (AAL), presented research on Can FPGAs beat GPUs in Accelerating Next-Generation Deep Neural Networks. Their research …
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Learning? was written by Nicole Hemsoth at The Next Platform.
ARM has rearchitected its multi-core chips to they can better compete in a world where computing needs are becoming more specialized.
The new DynamIQ architecture will provide flexible compute, with up to eight different cores in a single cluster on a system on a chip. Each core can run at a different clock speed so a company making an ARM SoC can tailor the silicon to handle multiple workloads at varying power efficiencies. The DynamIQ architecture also adds faster access to accelerators for artificial intelligence or networking jobs, and a resiliency that allows it to be used in robotics, autonomous …
New ARM Architecture Offers A DynamIQ Response To Compute was written by Timothy Prickett Morgan at The Next Platform.
In the datacenter, flash memory took off first as a caching layer between processors and their cache memories and main memory and the ridiculously slow disk drives that hang off the PCI-Express bus on the systems. It wasn’t until the price of flash came way down and the capacities of flash card and drives came down that companies could think about going completely to flash for some, much less all of their workloads.
So it will be with Intel’s Optane 3D XPoint non-volatile memory, which Intel is starting to roll out in its initial datacenter-class SSDs and will eventually deliver …
Like Flash, 3D XPoint Enters The Datacenter As Cache was written by Timothy Prickett Morgan at The Next Platform.
After years of planning and delays after a massive architectural change, the Blue Waters supercomputer at the National Center for Supercomputing Applications at the University of Illinois finally went into production in 2013, giving scientists, engineers and researchers across the country a powerful tool to run and solve the most complex and challenging applications in a broad range of scientific areas, from astrophysics and neuroscience to biophysics and molecular research.
Users of the petascale system have been able to simulate the evolution of space, determine the chemical structure of diseases, model weather, and trace how virus infections propagate via air …
Keeping The Blue Waters Supercomputer Busy For Three Years was written by Jeffrey Burt at The Next Platform.
Despite the fact that Google has developed its own custom machine learning chips, the company is well-known as a user of GPUs internally, particularly for its deep learning efforts, in addition to offering GPUs in its cloud.
At last year’s Nvidia GPU Technology Conference, Jeff Dean, Senior Google Fellow offered a vivid description of how the search giant has deployed GPUs for a large number of workloads, many centered around speech recognition and language-oriented research projects as well as various computer vision efforts. What was clear from Dean’s talk—and from watching other deep learning shops with large GPU cluster …
Google Team Refines GPU Powered Neural Machine Translation was written by Nicole Hemsoth at The Next Platform.
Solving problems by the addition of abstractions is a tried and true approach in technology. The management of high-performance computing workflows is no exception.
The Pegasus workflow engine and HTCondor’s DAGman are used to manage workflow dependencies. GridWay and DRIVE route jobs to different resources based on suitability or available capacity. Both of these approaches are important, but they share a key potential drawback: jobs are still treated as distinct units of computation to be scheduled individually by the scheduler.
As we have written previously, the aims of HPC resource administrators and HPC resource users are sometimes at odds. …
Increasing HPC Utilization with Meta-Queues was written by Nicole Hemsoth at The Next Platform.
The hyperscalers of the world are increasingly dependent on machine learning algorithms for providing a significant part of the user experience and operations of their massive applications, so it is not much of a surprise that they are also pushing the envelope on machine learning frameworks and systems that are used to deploy those frameworks. Facebook and Microsoft were showing off their latest hybrid CPU-GPU designs at the Open Compute Summit, and they provide some insight into how to best leverage Nvidia’s latest “Pascal” Tesla accelerators.
Not coincidentally, the specialized systems that have been created for supporting machine learning workloads …
Open Hardware Pushes GPU Computing Envelope was written by Timothy Prickett Morgan at The Next Platform.
Parallel computing has become a bedrock in the HPC field, where applications are becoming increasingly complex and such compute-intensive technologies as data analytics, deep learning and artificial intelligence (AI) are rapidly emerging. Nvidia and AMD have driven the adoption of GPU accelerators in supercomputers and other high-end systems, Intel is addressing the space with its many-core Xeon Phi processors and coprocessors and, as we’ve talked about at The Next Platform, other acceleration technologies like field-programmable gate arrays (FPGAs) are pushing their way into the picture. Parallel computing is a booming field.
However, the future was not always so assured. …
Chinese Researchers One Step Closer to Parallel Turing Machine was written by Nicole Hemsoth at The Next Platform.
The “serverless” trend has become the new hot topic in cloud computing. Instead of running Infrastructure-as-a-Service (IaaS) instances to provide a service, individual functions are executed on demand.
This has been a boon to the web development world, as it allows the creation of UI-driven workloads without the administrative overhead of provisioning, configuring, monitoring, and maintaining servers. Of course, the industry has not yet reached the point where computation can be done in thin air, so there are still servers involved somewhere. The point is that the customer is not concerned with mundane tasks such as operating system patching and …
Serving Up Serverless Science was written by Nicole Hemsoth at The Next Platform.
If Xzibit worked in the HPC field, he might be heard to say “I heard you like computers, so we modeled a computer with your computer so you can simulate your simulations.”
But simulating the performance of HPC applications is more than just recursion for comedic effect, it provides a key mechanism for the study and prediction of application behavior under different scenarios. While actually running the code on the system will yield a measure of the wallclock time, it does little to provide an explanation of what factors impacted that wallclock time. And of course it requires the system …
Peering Through Opaque HPC Benchmarks was written by Nicole Hemsoth at The Next Platform.
With over 700 employees, Cineca is Italy’s largest and most advanced high performance computing (HPC) center, channeling their systems expertise to benefit organizations across the nation. Comprised of six Italian research institutions, 70 Italian universities, and the Italian Ministry of Education, Cineca is a privately held, non-profit consortium.
The team at Cineca dedicates itself to tackling the greatest computational challenges faced by public and private companies, and research institutions. With so many organizations depending on Italy’s HPC centers, Cineca relies on Intel® technologies to reliably and efficiently further the country’s innovations in scientific computing, web and networking-based services, big data …
Cineca’s HPC Systems Tackle Italy’s Biggest Computing Challenges was written by Timothy Prickett Morgan at The Next Platform.
The Smith-Waterman algorithm has become a linchpin in the rapidly expanding world of bioinformatics, the go-to computational model for DNA sequencing and local sequence alignments. With the growth in recent years in genome research, there has been a sharp increase in the amount of data around genes and proteins that needs to be collected and analyzed, and the 36-year-old Smith-Waterman algorithm is a primary way of sequencing the data.
The key to the algorithm is that rather than examining an entire DNA or protein sequence, Smith-Waterman uses a technique called dynamic programming in which the algorithm looks at segments of …
Tuning Up Knights Landing For Gene Sequencing was written by Jeffrey Burt at The Next Platform.
The HPC community is trying to solve the critical compute challenges of next generation high performance computing and ARM considers itself well-positioned to act as a catalyst in this regard. Applications like machine learning and scientific computing are driving demands for orders of magnitude improvements in capacity, capability and efficiency to achieve exascale computing for next generation deployments.
ARM has been taking a co-design approach with the ecosystem from silicon to system design to application development to provide innovative solutions that address this challenge. The recent Allinea acquisition is one example of ARM’s commitment to HPC, but ARM has worked …
ARM Antes Up For An HPC Software Stack was written by Timothy Prickett Morgan at The Next Platform.
Nvidia has staked its growth in the datacenter on machine learning. Over the past few years, the company has rolled out features in its GPUs aimed neural networks and related processing, notably with the “Pascal” generation GPUs with features explicitly designed for the space, such as 16-bit half precision math.
The company is preparing its upcoming “Volta” GPU architecture, which promises to offer significant gains in capabilities. More details on the Volta chip are expected at Nvidia’s annual conference in May. CEO Jen-Hsun Huang late last year spoke to The Next Platform about what he called the upcoming “hyper-Moore’s Law” …
3D Stacking Could Boost GPU Machine Learning was written by Jeffrey Burt at The Next Platform.
Having a proliferation of server makes and models over a span of years in the datacenter is not a huge deal for most enterprises. They cope with the diversity because they support a diversity of application and can kind of keep things isolated and, moreover, IT may be integral to their product or service, but it is usually not the actual product or service that they sell.
Not so with hyperscalers and cloud builders. For them, the IT is the product, and keeping things as monolithic and consistent as possible lowers the cost of goods purchased through higher volumes and …
A Peek Inside Facebook’s Server Fleet Upgrade was written by Timothy Prickett Morgan at The Next Platform.
It is a good time to be the maker of a machine that excels in large-scale optimization problems for cybersecurity and defense. And it is even better to be the only maker of such a machine at a time when the need for a post-Moore’s Law system is in high demand.
We have already described the U.S. Department of Energy’s drive to place a novel architecture at the heart of one of the future exascale supercomputers, and we have also explored the range of options that might fall under that novel processing umbrella. From neuromorphic chips, deep learning PIM-based architectures, …
Strong FBI Ties for Next Generation Quantum Computer was written by Nicole Hemsoth at The Next Platform.