NQueens on CUDA: Optimization Issues
Ninth International Symposium on Parallel and Distributed Computing (ISPDC), 2010
@conference{feinbube2010nqueens,
title={NQueens on CUDA: Optimization Issues},
author={Feinbube, F. and Rabe, B. and von L{\”o}wis, M. and Polze, A.},
booktitle={2010 Ninth International Symposium on Parallel and Distributed Computing},
pages={63–70},
year={2010},
organization={IEEE}
}
Todays commercial off-the-shelf computer systems are multicore computing systems as a combination of CPU, graphic processor (GPU) and custom devices. In comparison with CPU cores, graphic cards are capable to execute hundreds up to thousands compute units in parallel. To benefit from these GPU computing resources, applications have to be parallelized and adapted to the target architecture. In this paper we show our experience in applying the NQueens puzzle solution on GPUs using Nvidia’s CUDA (Compute Unified Device Architecture) technology. Using the example of memory usage and memory access, we demonstrate that optimizations of CUDA programs may have contrary results on different CUDA architectures. Evaluation results will point out, that it is not sufficient to use new programming languages or compilers to achieve best results with emerging graphic card computing.
April 13, 2011 by hgpu