Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems

hgpu.org » Applications » Computer science » Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems

Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems

Mayank Daga

Virginia Polytechnic Institute

Virginia Polytechnic Institute, 2011

@phdthesis{daga2011architecture,

title={Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems},

author={Daga, M.},

year={2011},

school={Virginia Polytechnic Institute and State University}

}

Download (PDF)

View

Source

2160

views

The emergence of scientific applications embedded with multiple modes of parallelism has made heterogeneous computing systems indispensable in high performance computing. The popularity of such systems is evident from the fact that three out of the top five fastest supercomputers in the world employ heterogeneous computing, i.e., they use dissimilar computational units. A closer look at the performance of these supercomputers reveals that they achieve only around 50% of their theoretical peak performance. This suggests that applications that were tuned for erstwhile homogeneous computing may not be efficient for today’s heterogeneous computing and hence, novel optimization strategies are required to be exercised. However, optimizing an application for heterogeneous computing systems is extremely challenging, primarily due to the architectural differences in computational units in such systems. This thesis intends to act as a cookbook for optimizing applications on heterogeneous computing systems that employ graphics processing units (GPUs) as the preferred mode of accelerators. We discuss optimization strategies for multicore CPUs as well as for the two popular GPU platforms, i.e., GPUs from AMD and NVIDIA. Optimization strategies for NVIDIA GPUs have been well studied but when applied on AMD GPUs, they fail to measurably improve performance because of the differences in underlying architecture. To the best of our knowledge, this research is the first to propose optimization strategies for AMD GPUs. Even on NVIDIA GPUs, there exists a lesser known but an extremely severe performance pitfall called partition camping, which can affect application performance by up to seven-fold. To facilitate the detection of this phenomenon, we have developed a performance prediction model that analyzes and characterizes the effect of partition camping in GPU applications. We have used a large-scale, molecular modeling application to validate and verify all the optimization strategies. Our results illustrate that if appropriately optimized, AMD and NVIDIA GPUs can provide 371-fold and 328-fold improvement, respectively, over a hand-tuned, SSE-optimized serial implementation.

Tags: ATI, ATI Radeon HD 5870, Computer science, CUDA, Heterogeneous systems, Molecular modeling, nVidia, nVidia GeForce GTX 280, OpenCL, Optimization, Tesla C2050, Thesis

September 30, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org