Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead
Research Group for Communication Systems, Department of Computer Science, Christian-Albrechts-University Kiel, Germany
In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) (April 2010), pp. 1-8.
@conference{peters2010parallel,
title={Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead},
author={Peters, H. and Schulz-Hildebrandt, O. and Luttenberger, N.},
booktitle={Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on},
pages={1–8},
year={2010},
organization={IEEE}
}
Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU the problem of external sorting arises. In this contribution we present a novel merge-based external sorting algorithm for one or more CUDA-enabled GPUs. We reduce the performance impact of memory transfers to and from the GPU by using an approach similar to regular samplesort and by overlapping memory transfers with GPU computation. We achieve a good utilization of GPUs and load balancing among them by carefully choosing the samples and the amount of GPU memory used for computation. We demonstrate the performance of our algorithm by extended testing. Using two GTX280 the implementation outperforms the fastest CPU sorting algorithms known to the authors.
November 2, 2010 by hgpu