high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » GPU sample sort

GPU sample sort

Nikolaj Leischner, Vitaly Osipov, Peter Sanders

Universitat Karlsruhe (TH), Germany

arXiv:0909.5649v1 [cs.DS] (30 Sep 2009)

@conference{leischner2010gpu,

title={GPU sample sort},

author={Leischner, N. and Osipov, V. and Sanders, P.},

booktitle={Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on},

pages={1–10},

year={2010},

organization={IEEE}

}

Download (PDF)

View

Source

6632

views

In this paper, we present the design of a sample sort algorithm for manycore GPUs. Despite being one of the most efficient comparison-based sorting algorithms for distributed memory architectures its performance on GPUs was previously unknown. For uniformly distributed keys our sample sort is at least 25% and on average 68% faster than the best comparison-based sorting algorithm, GPU Thrust merge sort, and on average more than 2 times faster than GPU quicksort. Moreover, for 64-bit integer keys it is at least 63% and on average 2 times faster than the highly optimized GPU Thrust radix sort that directly manipulates the binary representation of keys. Our implementation is robust to different distributions and entropy levels of keys and scales almost linearly with the input size. These results indicate that multi-way techniques in general and sample sort in particular achieve substantially better performance than two-way merge sort and quicksort.

Tags: Computer science, CUDA, Data Structures and Algorithms, nVidia, nVidia GeForce GTX 285, Sorting, Tesla C1060

October 27, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

GPU sample sort

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

GPU sample sort

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)