Program Optimization Study on a 128-Core GPU
Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign
The First Workshop on General Purpose Processing on Graphics Processing Units, October 2007
@conference{ryoo2007program,
title={Program optimization study on a 128-core GPU},
author={Ryoo, S. and Rodrigues, C.I. and Stone, S.S. and Baghsorkhi, S.S. and Ueng, S.Z. and Hwu, W.W.},
booktitle={The First Workshop on General Purpose Processing on Graphics Processing Units},
year={2007},
organization={Citeseer}
}
The newest generations of graphics processing unit (GPU) architecture, such as the NVIDIA GeForce 8-series, feature new interfaces that improve programmability and generality over previous GPU generations. Using NVIDIA’s Compute Unified Device Architecture (CUDA), the GPU is presented to developers as a flexible parallel architecture. This flexibility introduces the opportunity to perform a wide variety of parallelization optimizations on applications, but it can be difficult to choose and control optimizations to give reliable performance benefit. This work presents a study that examines a broad space of optimization combinations performed on several applications ported to the GeForce 8800 GTX. By doing an exhaustive search of the optimization space, we find configurations that are up to 74% faster than those previously thought optimal. We explain the effects that optimizations can have on this architecture and how they differ from those on more traditional processors. For some optimizations, small changes in resource usage per thread can have very significant performance ramifications due to the thread assignment granularity of the platform and the lack of control over scheduling and allocation behavior of the runtime. We conclude with suggestions for better controlling resource usage and performance on this platform.
February 20, 2011 by hgpu