high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Fast GPGPU Data Rearrangement Kernels using CUDA

Fast GPGPU Data Rearrangement Kernels using CUDA

Michael Bader, Hans-Joachim Bungartz, Dheevatsa Mudigere, Srihari Narasimhan, Babu Narayanan

Chair for Scientific Computing, Department of Informatics, Technische Universitat Munchen, Munich, Germany

arXiv:1011.3583 [cs.DC] (16 Nov 2010)

BibTeX

Download (PDF)

View

Source

2738

views

Many high performance-computing algorithms are bandwidth limited, hence the need for optimal data rearrangement kernels as well as their easy integration into the rest of the application. In this work, we have built a CUDA library of fast kernels for a set of data rearrangement operations. In particular, we have built generic kernels for rearranging m dimensional data into n dimensions, including Permute, Reorder, Interlace/De-interlace, etc. We have also built kernels for generic Stencil computations on a two-dimensional data using templates and functors that allow application developers to rapidly build customized high performance kernels. All the kernels built achieve or surpass best-known performance in terms of bandwidth utilization.

Tags: Computer science, CUDA, Data Structures and Algorithms, nVidia, Performance, Tesla C1060

January 18, 2011 by hgpu

Rating: 2.5/5. From 1 vote.

Please wait...

high performance computing on graphics processing units: hgpu.org

Fast GPGPU Data Rearrangement Kernels using CUDA

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fast GPGPU Data Rearrangement Kernels using CUDA

Share this:

Recent source codes

Most viewed papers (last 30 days)