high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimization of GPU workloads using natural language processing based on deep learning techniques

Optimization of GPU workloads using natural language processing based on deep learning techniques

Petros Vavaroutsos

National Technical University of Athens

National Technical University of Athens, 2022

@article{vavaroutsos2022optimization,

title={Optimization of GPU workloads using natural language processing based on deep learning techniques},

author={Vavaroutsos, Petros},

year={2022}

}

Download (PDF)

View

Source

1689

views

Setting program parameters is challenging due to the abstract relationship between hardware and software. Automatic optimization algorithms that are accurate are required to cope with the complexity and variety of current hardware and software. Autotuning has always relied on time-consuming trial and error approaches. Machine learning (ML) and Natural Language Processing (NLP) has flourished over the last decade with research focusing on deep architectures. In this context, the use of natural language processing techniques to source code in order to conduct autotuning tasks is an emerging field of study. While previous research has successfully completed a variety of different autotuning tasks using a variety of different source code languages, the majority of source code data is CPU-centric, with relatively little GPU code. In this work, we make two contributions. We first utilize the dataset of OpenCL kernels from the work of Cummins et al. [1] to evaluate and compare our proposed six different deep neural networks to the state-of-the-art network. Our best model surpasses that of Cummins et al. work, providing, in total, a 2.65% improvement in prediction accuracy. In our second contribution, we extend our research to CUDA kernels and we create an end-to-end pipeline that incorporates a source-to-source compiler for thread and block coarsening transformations of CUDA kernels, a source rewriter that removes semantically irrelevant information from the kernels, creating train ready sequences, a profiling tool for measuring the performance of the transformed kernels, producing the prediction labels and, finally, our best machine learning model from our aforementioned neural network architecture research on OpenCL kernels. Hence, the pipeline receives a hand-written CUDA kernel and predicts its optimal configuration. To the best of our knowledge, this is the first work that attempts to apply NLP techniques on CUDA written applications for the specific optimizations. We evaluate our methodology on the LS-CAT dataset [2] for five different coarsening factors on NVIDIA V100S high-end GPU, we discover its vulnerabilities and examine the applicability of machine learning for three different prediction problems: thread coarsening binary classification, thread and block coarsening five-class classification. Our model achieves 84% accuracy on the binary classification, while it performs poorly when it comes to five-class classification.

Tags: ATI, ATI Radeon HD 7970, Computer science, CUDA, Deep learning, Machine learning, Neural networks, nVidia, nVidia GeForce GTX 970, OpenCL, Thesis

August 21, 2022 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Optimization of GPU workloads using natural language processing based on deep learning techniques

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Optimization of GPU workloads using natural language processing based on deep learning techniques

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)