High performance comparison-based sorting algorithm on many-core GPUs

Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, Paolo Ienne
Key Laboratory of Computer System and Architecture, Institute of Computing Technology (ICT), Chinese Academy of Sciences, Beijing, China
IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010


   title={High performance comparison-based sorting algorithm on many-core GPUs},

   author={Ye, X. and Fan, D. and Lin, W. and Yuan, N. and Ienne, P.},

   booktitle={Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on},





Download Download (PDF)   View View   Source Source   



Sorting is a kernel algorithm for a wide range of applications. In this paper, we present a new algorithm, GPU-Warpsort, to perform comparison-based parallel sort on Graphics Processing Units (GPUs). It mainly consists of a bitonic sort followed by a merge sort. Our algorithm achieves high performance by efficiently mapping the sorting tasks to GPU architectures. Firstly, we take advantage of the synchronous execution of threads in a warp to eliminate the barriers in bitonic sorting network. We also provide sufficient homogeneous parallel operations for all the threads within a warp to avoid branch divergence. Furthermore, we implement the merge sort efficiently by assigning each warp independent pairs of sequences to be merged and by exploiting totally coalesced global memory accesses to eliminate the bandwidth bottleneck. Our experimental results indicate that GPU-Warpsort works well on different kinds of input distributions, and it achieves up to 30% higher performance than previous optimized comparison-based GPU sorting algorithm on input sequences with millions of elements.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: