ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs
Advanced Micro Devices, Inc.
SIGGRAPH ’10 ACM SIGGRAPH 2010 Posters, 2010
@inproceedings{purnomo2010ati,
title={ATI Stream Profiler: a tool to optimize an OpenCL kernel on ATI Radeon GPUs},
author={Purnomo, B. and Rubin, N. and Houston, M.},
booktitle={ACM SIGGRAPH 2010 Posters},
pages={1–1},
year={2010},
organization={ACM}
}
Modern GPUs have been shown to be highly efficient machines for data-parallel applications such as graphics, image, video processing, or physical simulation applications. For example, a single ATI Radeon HD 5870 GPU has a theoretical peak of 2.72 teraflops (1012 floating-point operations per second) with a video memory bandwidth of 153.6 GB/s. While it is not difficult to port CPU algorithms to run on GPUs, it is extremely challenging to optimize the algorithms to achieve teraflops performance on GPUs. Only a select few expert engineers with the application domain expertise, a deep understanding of the modern GPU architecture, and an intimate knowledge of shader compiler optimization can program GPUs close to their optimal capabilities. Many developers are content with several folds of improvements rather than one or several orders of magnitude acceleration compared to their optimized CPU implementations.
August 18, 2011 by hgpu