GPU Multisplit

Saman Ashkiani, Andrew Davidson, Ulrich Meyer, John D. Owens
University of California, Davis
21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016


   title={GPU Multisplit},

   author={Ashkiani, Saman and Davidson, Andrew and Meyer, Ulrich and Owens, John D},

   booktitle={Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},



Download Download (PDF)   View View   Source Source   



Multisplit is a broadly useful parallel primitive that permutes its input data into contiguous buckets or bins, where the function that categorizes an element into a bucket is provided by the programmer. Due to the lack of an efficient multisplit on GPUs, programmers often choose to implement multisplit with a sort. However, sort does more work than necessary to implement multisplit, and is thus inefficient. In this work, we provide a parallel model and multiple implementations for the multisplit problem. Our principal focus is multisplit for a small number of buckets. In our implementations, we exploit the computational hierarchy of the GPU to perform most of the work locally, with minimal usage of global operations. We also use warp-synchronous programming models to avoid branch divergence and reduce memory usage, as well as hierarchical reordering of input elements to achieve better coalescing of global memory accesses. On an NVIDIA K40c GPU, for key-only (keyvalue) multisplit, we demonstrate a 3.0-6.7x (4.4-8.0x) speedup over radix sort, and achieve a peak throughput of 10.0 G keys/s.
Rating: 2.5/5. From 4 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: