Count Sort for GPU Computing
Sch. of Comput., Shenyang Inst. of Aeronaut. Eng., Shenyang, China
15th International Conference on Parallel and Distributed Systems (ICPADS), 2009
@conference{sun2009count,
title={Count Sort for GPU Computing},
author={Sun, W. and Ma, Z.},
booktitle={2009 15th International Conference on Parallel and Distributed Systems},
pages={919–924},
issn={1521-9097},
year={2009},
organization={IEEE}
}
Counting sort is a simple, stable and efficient sort algorithm with linear running time, which is a fundamental building block for many applications. This paper depicts the design issues of a data parallel implementation of the count sort algorithm on a commodity multiprocessor GPU using the Compute Unified Device Architecture (CUDA) platform, both from NVIDIA Corporation. The full parallel version runs much faster than any serial implementation on CPU with the loss of stability due to the limitation of the massive threads parallel model. But the thread-level parallel implementation still provides an efficient parallel sort primitive for many applications, which do not require stable sort or can be adapted for unstable subroutines.
April 14, 2011 by hgpu