high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Predicting the Execution Time of a kernel on a specific GPU using PTX code

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Monika Dagar, Jorge Roldan

Courant Institute of Mathematical Sciences, New York University, New York, US

New York University, 2023

BibTeX

Download (PDF)

View

Source

5132

views

During the last couple of decades, there has been an exponential growth in the amount of time and energy required to run workloads on high-performance computing systems, which nowadays rely heavily upon GPUs. In order to reduce the resources required by these systems, one clear approach is to avoid inefficient applications by using prediction models that could inform developers of the approximate execution time. In this work, we have trained models based on ensemble learning techniques such as Random Forest and Gradient Boosted Decision trees, as well as the deep learning architecture TabNet to predict the execution time of a specific kernel on a specific GPU architecture. We used data obtained using the CUDA-Flux profiler from the PTX code as input features. The best performing model in terms of the number of predictions with an error in the range of (0-10%) is CatBoost with 91.6%, Random Forests with 85.4%, and TabNet with 76.6%.

Tags: Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia GeForce GTX 1650, nVidia GeForce GTX Titan XP, Performance, PTX, Tesla K20, Tesla K80, Tesla M60, Tesla P100, Tesla T4, Tesla V100

October 22, 2023 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Predicting the Execution Time of a kernel on a specific GPU using PTX code

Share this:

Recent source codes

Most viewed papers (last 30 days)