Optimization of GPU workloads using natural language processing based on deep learning techniques

Petros Vavaroutsos
National Technical University of Athens
National Technical University of Athens, 2022


   title={Optimization of GPU workloads using natural language processing based on deep learning techniques},

   author={Vavaroutsos, Petros},



Download Download (PDF)   View View   Source Source   



Setting program parameters is challenging due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Autotuning has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over the last decade with research focusing on deep architectures. In this context, the use of natural language processing techniques to source code in order to conduct autotuning tasks is an emerging field of study. While previous research has successfully completed a variety of different autotuning tasks using a variety of different source code languages, the majority of source code data is CPU-centric, with relatively little GPU code. In this work, we make two contributions. We first utilize the dataset of OpenCL kernels from the work of Cummins et al. [1] to evaluate and compare our proposed six different deep neural networks to the state-of-the-art network. Our best model surpasses that of Cummins et al. work, providing, in total, a 2.65% improvement in prediction accuracy. In our second contribution, we extend our research to CUDA kernels and we create an end-to-end pipeline that incorporates a source-to-source compiler for thread and block coarsening transformations of CUDA kernels, a source rewriter that removes semantically irrelevant information from the kernels, creating train ready sequences, a profiling tool for measuring the performance of the transformed kernels, producing the prediction labels and, finally, our best machine learning model from our aforementioned neural network architecture research on OpenCL kernels. Hence, the pipeline receives a hand-written CUDA kernel and predicts its optimal configuration. To the best of our knowledge, this is the first work that attempts to apply NLP techniques on CUDA written applications for the specific optimizations. We evaluate our methodology on the LS-CAT dataset [2] for five different coarsening factors on NVIDIA V100S high-end GPU, we discover its vulnerabilities and examine the applicability of machine learning for three different prediction problems: thread coarsening binary classification, thread and block coarsening five-class classification. Our model achieves 84% accuracy on the binary classification, while it performs poorly when it comes to five-class classification.
No votes yet.
Please wait...

* * *

* * *

* * *

HGPU group © 2010-2022 hgpu.org

All rights belong to the respective authors

Contact us: