Patterns of Inefficient Performance Behavior in GPU Applications
Forschungszentrum Julich, Julich Supercomputing Centre, 52428 Julich, Germany
Proc. of the 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pages 262-266, Ayia Napa, Cyprus. IEEE Computer Society, February 2011
@inproceedings{eschweiler_ea:2011:gpupatt,
author={Eschweiler, Dominic and Becker, Daniel and Wolf, Felix},
month={feb},
title={Patterns of inefficient performance behavior in GPU applications},
booktitle={Proc. of the 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
year={2011},
pages={262-266},
publisher={IEEE Computer Society},
address={Ayia Napa, Cyprus},
isbn={978-0-7695-4328-4}
}
Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance of applications that use general-purpose processors along with GPU devices through a CUDA compute engine. To evaluate the general impact of these patterns on application performance, we further present a microbenchmark suite that allows the performance penalty of each pattern to be quantified with results obtained on NVIDIA Fermi and Tesla architectures, indeed demonstrating significant delays. Furthermore this suite can be used as a default test scenario to add CUDA support to performance-analysis tools used in high-performance computing.
March 16, 2011 by hgpu