high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Sorting on GPUs for large scale datasets: A thorough comparison

Sorting on GPUs for large scale datasets: A thorough comparison

Gabriele Capannini, Fabrizio Silvestri, Ranieri Baraglia

Information Science and Technology Inst., via G. Moruzzi 1, 56100 Pisa, Italy

Information Processing & Management, 2011

DOI:10.1016/j.ipm.2010.11.010

@article{capannini2011sorting,

title={Sorting on GPUs for large scale datasets: A thorough comparison},

author={Capannini, G. and Silvestri, F. and Baraglia, R.},

journal={Information Processing & Management},

year={2011},

publisher={Elsevier}

}

Download (PDF)

View

Source

5209

views

Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed by search engines). In order to fully exploit the potentiality of the underlying architecture, we designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures. According to K-model, our bitonic sorting network mapping improves the three main aspects of many-core architectures, i.e. the processors exploitation, and the on-chip/off-chip memory bandwidth utilization. Furthermore we are able to attain a space complexity of O(1). We experimentally compare our solution with state-of-the-art ones (namely, Quicksort and Radixsort) on GPUs. We also compute the complexity in the K-model for such algorithms. The conducted evaluation highlight that our bitonic sorting network is faster than Quicksort and slightly slower than radix, yet being an in-place solution it consumes less memory than both algorithms.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce 8800 GT, Sorting

December 24, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

high performance computing on graphics processing units: hgpu.org

Sorting on GPUs for large scale datasets: A thorough comparison

Your response

Recent source codes

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Nsight Python: a Python kernel profiling interface based on NVIDIA Nsight Tools

Awesome LLM-Driven Kernel Generation

PhysProver: Advancing Automatic Theorem Proving for Physics

ParaCodex: A Profiling-Guided Autonomous Coding Agent for Reliable Parallel Code Generation and Translation

Most viewed papers (last 30 days)

Sorting on GPUs for large scale datasets: A thorough comparison

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)