A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications
NAG Ltd, Manchester
NAG Technical Report, TR2/12, 2012
@article{toit2012high,
title={A High-Performance Brownian Bridge for GPUs: Lessons for Bandwidth Bound Applications},
author={Toit, Jacques Du},
year={2012}
}
We present a very exible Brownian bridge generator together with a GPU implementation which achieves close to peak performance on an NVIDIA C2050. The performance is compared with an OpenMP implementation run on several high performance x86-64 systems. The GPU shows a performance gain of at least 10x. Full comparative results are given in Section 8: in particular, we observe that the Brownian bridge algorithm does not scale well on multicore CPUs since it is memory bandwidth bound. The evolution of the GPU algorithm is discussed. Achieving peak performance required challenging the "conventional wisdom" regarding GPU programming, in particular the importance of occupancy, the speed of shared memory and the impact of branching.
April 4, 2012 by hgpu