Paragon: Collaborative Speculative Loop Execution on GPU and CPU

hgpu.org » Applications » Computer science » Paragon: Collaborative Speculative Loop Execution on GPU and CPU

Paragon: Collaborative Speculative Loop Execution on GPU and CPU

Mehrzad Samadi, Amir Hormati, Janghaeng Lee, Scott Mahlke

Advanced Computer Architecture Laboratory, University of Michigan – Ann Arbor, MI

Fifth Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU), 2012

BibTeX

Download (PDF)

View

Source

1667

views

The rise of graphics engines as one of the main parallel platforms for general purpose computing has ignited a wide search for better programming support for GPUs. Due to their non-traditional execution model, developing applications for GPUs is usually very challenging, and as a result, these devices are left under-utilized in many commodity systems. Several languages, such as CUDA, have emerged to solve this challenge, but past research has shown that developing applications in these languages is a daunting task because of the tedious performance optimization cycle or inherent algorithmic characteristics of an application, which could make it unsuitable for GPUs. Also, previous approaches of automatically generating optimized parallel code in CUDA for GPUs using complex compilation techniques have failed to utilize GPUs that are present in everyday computing devices such as laptops and mobile systems. In this work, we take a different approach. Although it is hard to generate optimized code for GPU, it is beneficial to utilize them speculatively rather than leaving them running idle due to their high raw performance capabilities compared to CPUs. To achieve this goal, we propose Paragon: a collaborative static/dynamic compiler platform to speculatively run possibly-data-parallel pieces of sequential applications on GPUs. Paragon utilizes the GPU in an opportunistic way for loops that are categorized as possibly-dataparallel by its loop classification phase. While running the loop speculatively, Paragon monitors the dependencies using a lightweight kernel management unit, and transfers the execution to the CPU in case a conflict is detected. Paragon resumes the execution on the GPU after the dependency is executed sequentially on the CPU. Our experiments show that Paragon achieves up to 12x speedup compared to unsafe CPU execution with 4 threads.

Tags: Code generation, Compilers, Computer science, CUDA, nVidia, nVidia GeForce GTX 560, Optimization, Programming techniques

March 8, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org