Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

hgpu.org » Applications » Computer science » Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Bastian Hagedorn, Archibald Samuel Elliott, Henrik Barthels, Rastislav Bodik, Vinod Grover

University of Münster

arXiv:2003.06324 [cs.PL], (13 Mar 2020)

BibTeX

Download (PDF)

View

Source

1984

views

Achieving high-performance GPU kernels requires optimizing algorithm implementations to the targeted GPU architecture. It is of utmost importance to fully use the compute and memory hierarchy, as well as available specialised hardware. Currently, vendor libraries like cuBLAS and cuDNN provide the best performing implementations of GPU algorithms. However the task of the library programmer is incredibly challenging: for each provided algorithm, high-performance implementations have to be developed for all commonly used architectures, input sizes, and different storage formats. These implementations are generally provided as optimized assembly code because performance-critical architectural features are only exposed at this level. This prevents reuse between different implementations of even the same algorithm, as simple differences can have major effects on low-level implementation details. In this paper we introduce Fireiron, a DSL and compiler which allows the specification of high-performance GPU implementations as compositions of simple and reusable building blocks. We show how to use Fireiron to optimize matrix multiplication implementations, achieving performance matching hand-coded CUDA kernels, even when using specialised hardware such as NIVIDA Tensor Cores, and outperforming state-of-the-art implementations provided by cuBLAS by more than 2x.

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, nVidia, nVidia GeForce GTX 750 Ti, nVidia GeForce RTX 2080 Ti, nVidia Quadro GV100, Programming Languages

March 22, 2020 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs

Share this:

Recent source codes

Most viewed papers (last 30 days)