high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » CUDA » A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications

A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications

Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc

Georgia Institute of Technology

17th ACM SIGPLAN Symposium on Principles and Practice of Parallal Programming (PPoPP), New Orleans, LA, February 2012

DOI:10.1145/2145816.2145819

BibTeX

Download (PDF)

View

Source

2387

views

Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most of the traditional tools, unfortunately, simply provide programmers with a variety of measurements and metrics obtained by running applications, and it is often difficult to map these metrics to understand the root causes of slowdowns, much less decide what next optimization step to take to alleviate the bottleneck. In our approach, we first develop an analytical performance model that can precisely predict performance and aims to provide programmer-interpretable metrics. Then, we apply static and dynamic profiling to instantiate our performance model for a particular input code and show how the model can predict the potential performance benefits. We demonstrate our framework on a suite of micro-benchmarks as well as a variety of computations extracted from real codes.

Tags: Analytical model, CUDA, GPGPU architecture, nVidia, Performance benefit prediction, Performance prediction, Tesla C2050

March 30, 2012 by Moaddeli

Rating: 2.5/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)