high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Opening the Black Box: Performance Estimation during Code Generation for GPUs

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Dominik Ernst, Georg Hager, Markus Holzer, Matthias Knorr, Gerhard Wellein

NHR@FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany

arXiv:2107.01143 [cs.PF], (2 Jul 2021)

BibTeX

Download (PDF)

View

Source

Source codes

Package:

pystencils

1514

views

Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. To cover the huge search space, code generation frameworks may apply time-intensive autotuning, exploit scenario-specific performance models, or treat performance as an intangible black box that must be described via machine learning. This paper addresses the selection problem by identifying the relevant performance-defining mechanisms through a performance model coupled with an analytic hardware metric estimator. This enables a quick exploration of large configuration spaces to identify highly efficient candidates with high accuracy. Our current approach targets memory-intensive GPGPU applications and focuses on the correct modeling of data transfer volumes to all levels of the memory hierarchy. We show how our method can be coupled to the pystencils stencil code generator, which is used to generate kernels for a range four 3D25pt stencil and a complex two phase fluid solver based on the Lattice Boltzmann Method. For both, it delivers a ranking that can be used to select the best performing candidate. The method is not limited to stencil kernels, but can be integrated into any code generator that can generate the required address expressions.

Tags: Code generation, Computer science, CUDA, Lattice Boltzmann model, Machine learning, nVidia, Performance, Stencil computation, Tesla V100

July 11, 2021 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Package:

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Opening the Black Box: Performance Estimation during Code Generation for GPUs

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)