Predicting the Execution Time of a kernel on a specific GPU using PTX code
Courant Institute of Mathematical Sciences, New York University, New York, US
New York University, 2023
@misc{roberto2023implementation,
title={Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL},
author={Roberto R, Osorio and David L, Vilari{~n}o and others},
year={2023},
publisher={Springer}
}
During the last couple of decades, there has been an exponential growth in the amount of time and energy required to run workloads on high-performance computing systems, which nowadays rely heavily upon GPUs. In order to reduce the resources required by these systems, one clear approach is to avoid inefficient applications by using prediction models that could inform developers of the approximate execution time. In this work, we have trained models based on ensemble learning techniques such as Random Forest and Gradient Boosted Decision trees, as well as the deep learning architecture TabNet to predict the execution time of a specific kernel on a specific GPU architecture. We used data obtained using the CUDA-Flux profiler from the PTX code as input features. The best performing model in terms of the number of predictions with an error in the range of (0-10%) is CatBoost with 91.6%, Random Forests with 85.4%, and TabNet with 76.6%.
October 22, 2023 by hgpu