Exploiting graphical processing units for data-parallel scientific applications
Computer Science, Institute of Information and Mathematical Sciences, Massey University, Albany, Auckland, New Zealand
Concurrency and Computation: Practice and Experience, Vol. 21, No. 18. (2009), pp. 2400-2437
DOI:10.1002/cpe.1462
@article{leist2009exploiting,
title={Exploiting graphical processing units for data-parallel scientific applications},
author={Leist, A. and Playne, DP and Hawick, KA},
journal={Concurrency and Computation: Practice and Experience},
volume={21},
number={18},
pages={2400–2437},
issn={1532-0634},
year={2009},
publisher={John Wiley & Sons}
}
Graphical processing units (GPUs) have recently attracted attention for scientific applications such as particle simulations. This is partially driven by low commodity pricing of GPUs but also by recent toolkit and library developments that make them more accessible to scientific programmers. We discuss the application of GPU programming to two significantly different paradigms – regular mesh field equations with unusual boundary conditions and graph analysis algorithms. The differing optimization techniques required for these two paradigms cover many of the challenges faced when developing GPU applications. We discuss the relevance of these application paradigms to simulation engines and games. GPUs were aimed primarily at the accelerated graphics market but since this is often closely coupled to advanced game products it is interesting to speculate about the future of fully integrated accelerator hardware for both visualization and simulation combined. As well as reporting the speed-up performance on selected simulation paradigms, we discuss suitable data-parallel algorithms and present code examples for exploiting GPU features like large numbers of threads and localized texture memory. We find a surprising variation in the performance that can be achieved on GPUs for our applications and discuss how these findings relate to past known effects in parallel computing such as memory speed-related super-linear speed up.
November 27, 2010 by hgpu