high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Fast Parallel Sorting Algorithms on GPUs

Fast Parallel Sorting Algorithms on GPUs

Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan

Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino, I-10129 Italy

International Journal of Distributed and Parallel systems (IJDPS), Volume 3, Number 6, 2012

DOI:10.5121/ijdps.2012.3609

BibTeX

Download (PDF)

View

Source

7695

views

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Tags: Algorithms, Computer science, Data parallelism, nVidia, nVidia GeForce GT 320 M, nVidia GeForce GTX 260, nVidia Quadro FX 6000, OpenCL, Sorting

December 4, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Fast Parallel Sorting Algorithms on GPUs

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fast Parallel Sorting Algorithms on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)