Revisiting sorting for GPGPU stream architectures
Department of Computer Science, University of Virginia
In PACT ’10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques (2010), pp. 545-546
@conference{merrill2010revisiting,
title={Revisiting sorting for GPGPU stream architectures},
author={Merrill, D.G. and Grimshaw, A.S.},
booktitle={Proceedings of the 19th international conference on Parallel architectures and compilation techniques},
pages={545–546},
year={2010},
organization={ACM}
}
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting methods exhibit speedup of at least 2x for all generations of NVIDIA GPGPUs, and up to 3.7x for current GT200-based models. Our implementations demonstrate sorting rates of 482 million key-value pairs per second, and 550 million keys per second (32-bit). For this domain of sorting problems, we believe our sorting primitive to be the fastest available for any fully-programmable microarchitecture.
January 5, 2011 by hgpu