high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Berenger Bramas

Max Planck Computing and Data Facility (MPCDF)

arXiv:1704.08579 [cs.MS], (24 Apr 2017)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

AVX-512 sort functions

2204

views

This paper describes fast sorting techniques using the recent AVX-512 instruction set. Our implementations benefit from the latest possibilities offered by AVX-512 to vectorize a two-parts hybrid algorithm: we sort the small arrays using a branch- free Bitonic variant, and we provide a vectorized partitioning kernel which is the main component of the well-known Quicksort. Our algorithm sorts in-place and is straightforward to implement thanks to the new instructions. Meanwhile, we also show how an algorithm can be adapted and implemented with AVX-512. We report a performance study on the Intel KNL where our approach is faster than the GNU C++ sort algorithm for any size in both integer and double floating-point arithmetics by a factor of 4 in average.

Tags: Algorithms, Computer science, Intel Xeon Phi, OpenMP, Package, Sorting

May 9, 2017 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)