high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Two Algorithms for Sorting On Heterogeneous Clusters

Two Algorithms for Sorting On Heterogeneous Clusters

Kyle Spafford, Jeremy Meredith, Jeffrey Vetter, Aparna Chandramowlishwaran, David Noble, Richard Vuduc

Future Technologies Group, Oak Ridge National Laboratory

Future Technologies Group, 2012

BibTeX

Download (PDF)

View

Source

2243

views

In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations are simple, but require sorted data. This paper explores variations on two classic algorithms for distributed sorting-radix and sample sort – under two novel constraints imposed by the projected requirements of an exascale machine, heterogeneity and limited external storage. The two approaches are evaluated on the GPU-based NSF Keeneland system, including an analysis of data movement and the effects of GPUs on performance and scalability. Results from Keeneland indicate a substantial performance advantage for sample-based approaches on some data distributions, but this advantage comes at the cost of randomized behavior and load imbalance.

Tags: Algorithms, Computer science, CUDA, Heterogeneous systems, MPI, nVidia, OpenMP, Sorting, Tesla M2070

June 19, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Two Algorithms for Sorting On Heterogeneous Clusters

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Two Algorithms for Sorting On Heterogeneous Clusters

Share this:

Recent source codes

Most viewed papers (last 30 days)