Parallel Prefix Scan with Compute Unified Device Architecture (CUDA)
JNTUA college of Engineering, Pulivendula, A.P. India
9th IRF International Conference, 2014
Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-by-step procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and proceeds through more advanced techniques to obtain best performance.
June 11, 2014 by hgpu