high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Analysis of High Level implementations for Recursive Methods on GPUs

Analysis of High Level implementations for Recursive Methods on GPUs

Cheng Cao, Justin Kalloor

University of California, Berkeley, CA

CS262a: Fall 2021 Final Projects, 2021

@article{caoan2021alysis,

title={Analysis of High Level implementations for Recursive Methods on GPUs},

author={Cao, Cheng and Kalloor, Justin},

year={2021}

}

Download (PDF)

View

Source

2253

views

Higher level DSLs have allowed for performant computation on GPUs while providing enough abstraction to the user to avoid significant deployment overhead. However, the SIMD/SIMT model of programming still can encounter unexpected performance drops when trying to translate naively from CPU code. One example of these performance drops is branch divergence, and this failure is especially exacerbated by recursive methods, as the depth of these recursions can vary greatly between threads. This paper investigates ways to enable recursion and task oriented programming using the Taichi DSL. We first present different methods of accomplishing this task, and benchmark each. Utilizing Taichi’s multiple back-end code-gen targets, we investigate the performance of recursion tasks on different backends. We compile these results into a final model that automatically chooses the best implementation for a given user program. In our benchmarks, we see massive improvement in throughput over the naive implementation, up to 500%.

Tags: Computer science, CUDA, DSL, nVidia, nVidia GeForce RTX 3080, OpenGL, Vulkan

January 9, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Analysis of High Level implementations for Recursive Methods on GPUs

Your response

Recent source codes

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

TRUST: a thermalhydraulic software package for CFD simulations

Modular: The Modular Platform (includes MAX & Mojo)

Allo: Accelerator Design Language

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization

Most viewed papers (last 30 days)

Analysis of High Level implementations for Recursive Methods on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)