An Empirically Optimized Radix Sort for GPU
Electr. & Comput. Eng. Dept., Univ. of Delaware, Newark, DE, USA
IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009
@conference{huang2009empirically,
title={An Empirically Optimized Radix Sort for GPU},
author={Huang, B. and Gao, J. and Li, X.},
booktitle={2009 IEEE International Symposium on Parallel and Distributed Processing with Applications},
pages={234–241},
year={2009},
organization={IEEE}
}
In this paper, we propose an empirical optimization technique for one of the most important sorting routines on GPU, the radix sort, that generates highly efficient code for a number of representative NVIDIA GPUs with a wide variety of architectural specifications. Our study has been focused on the algorithmic parameters of radix sort that can be adapted to different environments and the GPU architectural factors that affect the performance of radix sort. We present a powerful empirical optimization approach that is shown to be able to find highly efficient code for different NVIDIA GPUs. Our results show that such an empirical optimization approach is quite effective at taking into account the complex interactions between architectural characteristics and that the resulting code performs significantly better than two radix sort implementations that have been shown outperforming other GPU sort routines with the maximal speedup of 33.4%.
March 28, 2011 by hgpu