Fast in-place sorting with CUDA based on bitonic sort
Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany
Parallel Processing and Applied Mathematics, Lecture Notes in Computer Science, 2010, Volume 6067/2010, 403-410, Proceedings of the 8th international conference on Parallel processing and applied mathematics, PPAM’09: Part I
@conference{peters2009fast,
title={Fast in-place sorting with CUDA based on bitonic sort},
author={Peters, H. and Schulz-Hildebrandt, O. and Luttenberger, N.},
booktitle={Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I},
pages={403–410},
isbn={364214389X},
year={2009},
organization={Springer-Verlag}
}
State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is well-investigated in Computer Science in general, but (because of this new field of application for GPUs) there is a demand for high-performance parallel sorting algorithms that fit to the characteristics of modern GPU-architecture. We present a high-performance in-place implementation of Batcher’s bitonic sorting networks for CUDA-enabled GPUs. We adapted bitonic sort for arbitrary input length and assigned compare/exchange-operations to threads in a way that decreases low-performance global-memory access and thereby greatly increases the performance of the implementation.
March 24, 2011 by hgpu