Dynamic Orchestration of Massively Data Parallel Execution

Mehrzad Samadi
University of Michigan
University of Michigan, 2014


   title={Dynamic Orchestration of Massively Data Parallel Execution},

   author={Samadi, Mehrzad},


   school={University of Michigan}


Download Download (PDF)   View View   Source Source   



Graphics processing units (GPUs) are specialized hardware accelerators capable of rendering graphics much faster than conventional general-purpose processors. They are widely used in personal computers, tablets, mobile phones, and game consoles. Modern GPUs are not only efficient at manipulating computer graphics, but also are more effective than CPUs for algorithms where processing of large data blocks can be done in parallel. This is mainly due to their highly parallel architecture. While GPUs provide low-cost and efficient platforms for accelerating massively parallel applications, tedious performance tuning is required to maximize application execution efficiency. Achieving high performance requires the programmers to manually manage the amount of on-chip memory used per thread, the total number of threads per multiprocessor, the pattern of off-chip memory accesses, etc. In addition to a complex programming model, there is a lack of performance portability across various systems with different runtime properties. Programmers usually make assumptions about runtime properties when they write code and optimize that code based on those assumptions. However, if any of these properties changes during execution, the optimized code performs poorly. To alleviate these limitations, several implementations of the application are needed to maximize performance for different runtime properties. However, it is not practical for the programmer to write several different versions of the same code which are optimized for each individual runtime condition. In this thesis, we propose a static and dynamic compiler framework to take the burden of fine tuning different implementations of the same code off the programmer. This framework enables the programmer to write the program once and allow a static compiler to generate different versions of a data parallel application with several tuning parameters. The runtime system selects the best version and fine tunes its parameters based on runtime properties such as device configuration, input size, dependency, and data values.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: