The architectural implications of autonomous driving: constraints and acceleration Lin et al., ASPLOS’18
Today’s paper is another example of complementing CPUs with GPUs, FPGAs, and ASICs in order to build a system with the desired performance. In this instance, the challenge is to build an autonomous self-driving car!
Architecting autonomous driving systems is particularly challenging for a number of reasons…
There are several defined levels of automation, with level 2 being ‘partial automation’ in which the automated system controls steering and acceleration/deceleration under limited driving conditions. At level 3 the automated system handles all driving tasks under limited conditions (with a human driver taking over outside of that). By level 5 Continue reading
Darwin: a genomics co-processor provides up to 15,000x acceleration on long read assembly Turakhia et al., ASPLOS’18
With the slow demise of Moore’s law, hardware accelerators are needed to meet the rapidly growing computational requirements of X.
For this paper, X = genomics, and genomic data is certainly growing fast: doubling every 7 months and on track to surpass YouTube and Twitter by 2025. Rack-size machines can sequence 50 genomes a day, portable sequencers require several days per genome. Third-generation sequencing technologies are now available which produce much longer reads of contiguous DNA – on the order of tens of kilobases compared to only a few hundred bases with the previous generations of technology.
For personalized medicine, long reads are superior in identifying structural variants i.e. large insertions, deletions and re-arrangements in the genome spanning kilobases or more which are sometimes associated with diseases; for haplotype phasing, to distinguish mutations on maternal vs paternal chromosomes; and for resolving highly repetitive regions in the genome.
The long read technology comes with a drawback though – high error rates in sequencing of between 15%-40%. The errors are corrected using computational methods ‘that can be orders of magnitude slower than Continue reading
Google workloads for consumer devices: mitigating data movement bottlenecks Boroumand et al., ASPLOS’18
What if your mobile device could be twice as fast on common tasks, greatly improving the user experience, while at the same time significantly extending your battery life? This is the feat that the authors of today’s paper pull-off, using a technique known as processing-in-memory (PIM). PIM moves some processing into the memory itself, avoiding the need to transfer data from memory to the CPU for those operations. It turns out that such data movement is a major contributor to the total system energy usage, so eliminating it can lead to big gains.
Our evaluation shows that offloading simple functions from these consumer workloads to PIM logic, consisting of either simple cores or specialized accelerators, reduces system energy consumption by 55.4% and execution time by 54.2%, on average across all of our workloads.
While the performance requirements of consumer devices increase year on year, and devices pack in power-hungry CPUs, GPUs, special-purpose accelerators, sensors and high-resolution screens to keep pace, lithium ion battery capacity has only doubled in the last 20 years. Moreover, the thermal power dissipation in consumer Continue reading