high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Patterns of Inefficient Performance Behavior in GPU Applications

Patterns of Inefficient Performance Behavior in GPU Applications

Dominic Eschweiler, Daniel Becker, Felix Wolf

Forschungszentrum Julich, Julich Supercomputing Centre, 52428 Julich, Germany

Proc. of the 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pages 262-266, Ayia Napa, Cyprus. IEEE Computer Society, February 2011

BibTeX

Download (PDF)

View

Source

1743

views

Writing efficient software for heterogeneous architectures equipped with modern accelerator devices presents a serious challenge to programmer productivity, creating a need for powerful performance-analysis tools to adequately support the software development process. To guide the design of such tools, we describe typical patterns of inefficient runtime behavior that may adversely affect the performance of applications that use general-purpose processors along with GPU devices through a CUDA compute engine. To evaluate the general impact of these patterns on application performance, we further present a microbenchmark suite that allows the performance penalty of each pattern to be quantified with results obtained on NVIDIA Fermi and Tesla architectures, indeed demonstrating significant delays. Furthermore this suite can be used as a default test scenario to add CUDA support to performance-analysis tools used in high-performance computing.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 480, Optimization, Performance, Tesla T10

March 16, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Patterns of Inefficient Performance Behavior in GPU Applications

Your response

Recent source codes

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

Most viewed papers (last 30 days)

Patterns of Inefficient Performance Behavior in GPU Applications

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)