Analysis of High Level implementations for Recursive Methods on GPUs
University of California, Berkeley, CA
CS262a: Fall 2021 Final Projects, 2021
@article{caoan2021alysis,
title={Analysis of High Level implementations for Recursive Methods on GPUs},
author={Cao, Cheng and Kalloor, Justin},
year={2021}
}
Higher level DSLs have allowed for performant computation on GPUs while providing enough abstraction to the user to avoid significant deployment overhead. However, the SIMD/SIMT model of programming still can encounter unexpected performance drops when trying to translate naively from CPU code. One example of these performance drops is branch divergence, and this failure is especially exacerbated by recursive methods, as the depth of these recursions can vary greatly between threads. This paper investigates ways to enable recursion and task oriented programming using the Taichi DSL. We first present different methods of accomplishing this task, and benchmark each. Utilizing Taichi’s multiple back-end code-gen targets, we investigate the performance of recursion tasks on different backends. We compile these results into a final model that automatically chooses the best implementation for a given user program. In our benchmarks, we see massive improvement in throughput over the naive implementation, up to 500%.
January 9, 2022 by hgpu