high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » LS-CAT: A Large-Scale CUDA AutoTuning Dataset

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

Lars Bjertnes, Jacob O. Tørring, Anne C. Elster

Department of Computer Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

arXiv:2103.14409 [cs.DC], (26 Mar 2021)

@misc{bjertnes2021lscat,

title={LS-CAT: A Large-Scale CUDA AutoTuning Dataset},

author={Lars Bjertnes and Jacob O. Tørring and Anne C. Elster},

year={2021},

eprint={2103.14409},

archivePrefix={arXiv},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

1747

views

The effectiveness of Machine Learning (ML) methods depend on access to large suitable datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 CUDA kernels focused on linear algebra. In addition to the CUDA codes, our LS-CAT dataset contains 5 028 536 associated runtimes, with different combinations of kernels, block sizes and matrix sizes. The runtime are GPU benchmarks on both Nvidia GTX 980 and Nvidia T4 systems. This information creates a foundation upon which NLP-based models can find correlations between source-code features and optimal choice of thread block sizes. There are several results that can be drawn out of our LS-CAT database. E.g., our experimental results show that an optimal choice in thread block size can gain an average of 6% for the average case. We thus also analyze how much performance increase can be achieved in general, finding that in 10% of the cases more than 20% performance increase can be achieved by using the optimal block. A description of current and future work is also included.

Tags: Benchmarking, Computer science, CUDA, Databases, Linear Algebra, Machine learning, nVidia, nVidia GeForce GTX 980, Tesla T4

April 5, 2021 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

LS-CAT: A Large-Scale CUDA AutoTuning Dataset

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)