Concurrent GPU Programming
School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1
University of Waterloo, 2009
@article{northam2009concurrent,
title={Concurrent GPU Programming},
author={Northam, L.},
year={2009}
}
Monte Carlo algorithms use repeated random sampling to find solutions to problems. One common example uses points randomly selected from the unit box to approximate the value of pi. Another example is a simulation called a virtual spectrophotometer which measures the reflectance of a modeled material [1]. The repetitive nature of Monte Carlo algorithms usually causes these programs to be time and energy intensive. These repetitions are identical and mutually independent, leading to easy parallelization. Repetition level granularity may require an exorbitant number of threads. Multi-core architectures are capable of parallel thread execution, but the massively multi-threaded GPU architecture is better suited to Monte Carlo and virtual spectrophotometer work-loads because they offer orders of magnitude more thread-level parallelism. The purpose of this project is to explore general purpose GPU programming and implement a virtual spectrophotometer that uses the massively multi-threaded nature of the GPU. Particular attention is given to the concurrency issues that limit the performance of the spectrophotometer on the GPU. This report is divided into four sections. The first section provides background information on general purpose GPU programming. The second section details the CUDA hardware abstraction layer. The third section discusses the limitations of CUDA applications and provides potential solutions to these limitations. The final section analyzes several CUDA implementations of a virtual spectrophotometer and compares their execution time with a single-thread CPU implementation.
February 26, 2011 by hgpu