high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

Roberto L. Castro, Diego Andrade, Basilio B. Fraguela

CITIC, Computer Architecture Group, University of A Coruña, 15071 A Coruña, Spain

IEEE Access, Volume: 12, 2024

DOI:10.1109/ACCESS.2024.3402326

@article{castro2024stuning,

title={STuning-DL: Model-Driven Autotuning of Sparse GPU kernels for Deep Learning},

author={Castro, Roberto L and Andrade, Diego and Fraguela, Basilio B},

journal={IEEE Access},

year={2024},

publisher={IEEE}

}

Download (PDF)

View

Source

358

views

The relentless growth of modern Machine Learning models has spurred the adoption of sparsification techniques to simplify their architectures and reduce the computational demands. Network pruning has demonstrated success in maintaining original network accuracy while shedding significant portions of the original weights. However, leveraging this sparsity efficiently remains challenging due to computational irregularities, particularly in GPU kernels. A new trend of template-based GPU kernels for semi-structured sparsity shows promise in efficiency but lacks autotuning capabilities to adapt to input dynamics, often underperforming in scenarios where they have not been meticulously hand-tuned. We present STuning-DL, the first pruning-aware autotuner for third-party template-based implementations enabling efficient optimization of sparse kernels for Deep Learning, spanning from high-level aspects (CUDA C++ level) down to GPU-native instructions specifics (assembly-level). STuning-DL tunes and optimizes at run-time sparse kernels’ performance for each input problem, yielding speedups of up to 5.42× on NVIDIA T4-16GB and up to 3.6× on NVIDIA A100-40GB GPU in sparse matrices from real world models compared to existing heuristics from sparse libraries like cuSparse and cuSparseLt.

Tags: Auto-Tuning, Computer science, CUDA, Deep learning, Machine learning, nVidia, nVidia A100, Tesla T4

May 26, 2024 by hgpu

No votes yet.

Please wait...

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

* * *

high performance computing on graphics processing units: hgpu.org

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

Recent source codes

Astaroth: A Scalable Multi-GPU Library for Stencil Computations

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking

Autotuning Methodology Software Package

HAL's MD package: Highly Accelerated Large-scale Molecular Dynamics simulations

Fast and Practical FPGA-based Strassen's Matrix Multiplication

Improved Models for Policy-Agent Learning of Compiler Directives in HLS

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

CuPBoP-AMD: Extending CUDA to AMD Platforms

Adopter: Automated Deep Learning Optimization via DSL-based Source Code Transformation

Code examples for paper on SYCL backend of Kokkos - IWOCL 2024

Most viewed papers (last 30 days)

STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

Share this:

Recent source codes

Most viewed papers (last 30 days)