high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » LLMPerf: GPU Performance Modeling meets Large Language Models

LLMPerf: GPU Performance Modeling meets Large Language Models

Khoi N.M. Nguyen, Hoang Duy Nguyen Do, Huyen Thao Le, Thanh Tuan Dao

FPT Software AI Center, Hanoi, Vietnam

arXiv:2503.11244 [cs.PF], (14 Mar 2025)

DOI:10.48550/arXiv.2503.11244

BibTeX

Download (PDF)

View

Source

Source codes

Package:

LLMPerf: GPU Performance Modeling meets Large Language Models

1088

views

Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landscape of GPGPU. Meanwhile, Large Language Models (LLMs) have demonstrated their effectiveness in addressing diverse programming challenges. Our work establishes a connection between LLMs and performance modeling, employing the LLM as a performance estimator. Through experimental exploration with carefully designed large-scale OpenCL datasets, we highlight the potential capability as well as the main difficulties of using LLMs in handling performance modeling tasks for OpenCL device source programs. As the first study for this line of work, our LLM-based performance model achieves a mean absolute percentage error of 24.25% for a large-scale generated validation set. On a set of publicly available OpenCL programs, our model achieves a mean absolute percentage error of 46.1%.

Tags: Computer science, LLM, Machine learning, nVidia, OpenCL, Performance, Tesla V100

March 23, 2025 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

LLMPerf: GPU Performance Modeling meets Large Language Models

Package:

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)

LLMPerf: GPU Performance Modeling meets Large Language Models

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)