high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

A. Gres, G. Zachmann

Institute of Computer Science II Rhein. Friedr.-Wilh.-Universitat Bonn, Bonn, German

Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium, pp. 1-10, 2006

DOI:10.1109/IPDPS.2006.1639284

BibTeX

Download (PDF)

View

Source

3913

views

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. This is achieved by using efficient linear stream memory accesses (and by combining the optimal time approach with algorithms optimized for small input sequences). We present an implementation on modern programmable graphics hardware (GPUs). On GPUs, our optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous non-optimal sorting approaches on the GPU for sufficiently large input sequences. Because of the excellent scalability of our algorithm with the number of stream processor units p (up to n/log 2 n or even n/log n units, depending on the stream architecture), our approach profits heavily from the trend of increasing number of fragment processor units on GPUs, so that we can expect further speed improvement with upcoming GPU generations.

Tags: Algorithms, Computer science, nVidia, nVidia GeForce 6800 Ultra, nVidia GeForce 7800 GTX, Sorting

December 13, 2010 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

Recent source codes

Coccinelle: a C code transformation engine using SmPL for matches, refactorings, and bug fixing

DuoReduce: MLIR's benchmark

Shamrock: Multi-GPU hydrodynamics for astrophysics

LLMPerf: GPU Performance Modeling meets Large Language Models

Hercules: A Compiler for Productive Programming of Heterogeneous Systems

Celerity Runtime: High-level C++ for Accelerator Clusters

wgpy: WebGL accelerated numpy-compatible array library for web browser

Microbenchmarking OpenMP target offload with Catch2

SUperman: Highly Efficient Permanent Computation Library

TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework

Most viewed papers (last 30 days)

GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)