high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Using modern C++ to improve CUDA programs

Using modern C++ to improve CUDA programs

Mythreya Kuricheti

University of California, Davis

University of California, Davis, 2024

BibTeX

Download (PDF)

View

Source

1309

views

The classic style of writing and porting HPC applications to the GPU uses pointers to buffers or data-structures as kernel parameters. This style discards type information, leading to “flattening” of CPU-side data-structures before using them as kernel parameters, followed by a need to reconstruct them in GPU code to retain flexibility. In this thesis, we identify several major problems during the porting process, including lack of vectors or views into a GPU buffer, bounds checking, iterator support, macro-dependent function specialization on the GPU, and GPU allocators for arbitrary types. These are all features that are already supported by CUDA in kernel code, but programmers are generally unable to use them due to data-structures decaying to pointers in kernel invocations. We demonstrate these problems and present techniques to overcome them in an implementation in C++ and CUDA. We use modern C++ features to make CPU-side features (such as iterators, ranged-for loops, and bounds checking) first-class citizens in GPU kernel code while maintaining interoperability with existing libraries. The result is a new ability to use CPU-style coding patterns in GPU kernel code. We demonstrate that our abstractions generate equally good assembly as the classical implementations. As a case study, we use the library to simplify the porting process of accelerating a shallow-water simulation framework “HEC-RAS” to the GPU.

Tags: Computer science, CUDA, nVidia, nVidia Quadro RTX 5000, Performance, PTX, Thesis

October 27, 2024 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Using modern C++ to improve CUDA programs

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Using modern C++ to improve CUDA programs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)