high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Degradation Analysis of GPU Kernels

Performance Degradation Analysis of GPU Kernels

Jinpeng Lv, Guodong Li, Alan Humphrey, Ganesh Gopalakrishnan

Electrical & Computer Engineering, University of Utah, Salt Lake City, UT

Computer Aided Verification (CAV 2011 – EC2 Workshop), 2011

@article{lv2012performance,

title={Performance Degradation Analysis of GPU Kernels},

author={Lv, Jinpeng and Li, Guodong and Humphrey, Alan and Gopalakrishnan, Ganesh},

year={2011}

}

Download (PDF)

View

Source

2186

views

Hardware accelerators (currently Graphical Processing Units or GPUs) are an important component in many existing high-performance computing solutions [5]. Their growth in variety and usage is expected to skyrocket [1] due to many reasons. First, GPUs offer impressive energy efficiencies [3]. Second, when properly programmed, they yield impressive speedups by allowing programmers to model their computation around many fine-grained threads whose focus can be rapidly switched during memory stalls. Unfortunately, arranging for high memory access efficiency requires developed computational thinking to properly decompose a problem domain to gain this efficiency. Our work currently addresses the needs of the CUDA [5] approach to programming GPUs. Two important classes of such rules are bank conflict avoidance rules that pertain to CUDA shared memory and coalesced access rules that pertain to global memory. The former requires programmers to generate memory addresses from consecutive threads that fall within separate shared memory banks. The latter requires programmers to generate memory addresses that permit coalesced fetches from the global memory. In previous work [6], we had, to some extent addressed the former through SMTbased methods. Several other efforts also address bank conflicts [7, 8, 4]. In this work, we address the latter requirement-detecting when coalesced access rules are being violated.

Tags: Computer science, CUDA, nVidia, Performance

April 23, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Performance Degradation Analysis of GPU Kernels

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Performance Degradation Analysis of GPU Kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)