high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Microarchitectural Performance Characterization of Irregular GPU Kernels

Microarchitectural Performance Characterization of Irregular GPU Kernels

Molly A. O’Neil, Martin Burtscher

Department of Computer Science, Texas State University, San Marcos, TX

2014 IEEE International Symposium on Workload Characterization, 2014

@article{neil2014microarchitectural,

title={Microarchitectural Performance Characterization of Irregular GPU Kernels},

author={O’Neil, Molly A. and Burtscher, Martin},

year={2014}

}

Download (PDF)

View

Source

2546

views

GPUs are increasingly being used to accelerate general-purpose applications, including applications with data-dependent, irregular memory access patterns and control flow. However, relatively little is known about the behavior of irregular GPU codes, and there has been minimal effort to quantify the ways in which they differ from regular GPGPU applications. We examine the behavior of a suite of optimized irregular CUDA applications on a cycle-accurate GPU simulator. We characterize the performance bottlenecks in each program and connect source code with microarchitectural characteristics. We also assess the impact of improvements in cache and DRAM bandwidth and latency and discuss the implications for GPU architecture design. We find that, while irregular graph codes exhibit significantly more underutilized execution cycles due to thread divergence, load imbalance, and synchronization overhead than regular programs, these factors contribute less to performance degradation than we expected. It appears that code optimizations are often able to effectively address these performance hurdles. Insufficient bandwidth and long memory latency are the biggest limiters of performance. Surprisingly, we find that applications with irregular memory access patterns are more sensitive to changes in L2 latency and bandwidth than DRAM latency and bandwidth.

Tags: Computer science, CUDA, GPGPU-sim, nVidia, Performance

October 3, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Microarchitectural Performance Characterization of Irregular GPU Kernels

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Microarchitectural Performance Characterization of Irregular GPU Kernels

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)