Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance
University of Edinburgh
University of Edinburgh, 2013
@article{anker2013pseudo,
title={Pseudo Random Number Generators on Graphics Processing Units, with Applications in Finance},
author={Anker, Matthew},
year={2013}
}
We have seen more and more interest in taking advantage of GPUs to accelerate simulations. However, the RNGs driving these simulations tend to be existing CPU generators that have been converted for use on GPUs. The result is a generator that does not efficiently utilise the resources and constraints of that architecture. Consequently, the performance of the simulation, and sometimes even the quality of its results, are not in line with GPU capabilities. Therefore, this project has looked at the GPU platform for parallel computing and we have attempted to construct a RNG framework for generating the uniform and Gaussian distribution, taking into account the characteristics and constraints of the target platform. The key features of our GPU RNG framework was the novel use of fast shared memory that is shared between threads within a warp to store, exchange and update the generator state. Exploiting the fact that threads execute in batches (i.e. warps) means we avoided the need for explicit communication and we effectively get synchronisation for free, thus eliminating a significant bottleneck. This basic framework can be used to create generators with very long periods without increasing pressure on register usage or the use of slower global memory. The one important precondition is that if one thread in a warp calls generate, then all threads must do the same which means there must be no warp divergence around the generate function, otherwise the RNG state will be corrupted. This constraint lead to substantial per The RNG framework presented in this project can be naturally and easily extended over different parameter sets to produce generators of varying period and quality. The generators that we proposed within our framework are fast, require little state to be stored in registers, have long periods and are statistically strong by virtue of passing the initial set of statistical tests in the TestU01 Small Crush battery (which many popular, conventional RNGs have failed). The generators are small enough to be embed directly in a device-side application thus permitting on-the-fly random number generation. This has dramatic performance benefits, as shown by the real-life Monte Carlo option pricing examples. We have showed that our GPU dedicated RNGs substantially out perform two conventional RNGs ported to the GPU, as well as having superior periods and quality. We found the performance impact of writing output to global memory bounded performance significantly, compared to on-the-fly computation.
November 25, 2013 by hgpu