high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Divergence Analysis with Affine Constraints

Divergence Analysis with Affine Constraints

Diogo Sampaio, Rafael Martins, Fernando Magno Quintao Pereira, Sylvain Collange

UFMG – 6627 Antonio Carlos Av, 31.270-010, Belo, Horizonte, Brazil

hal-00650235, 2011

@article{sampaio2011divergence,

title={Divergence Analysis with Affine Constraints},

author={Sampaio, D. and Martins, R. and Collange, S. and Magno Quint{~a}o Pereira, F. and others},

year={2011}

}

Download (PDF)

View

Source

Source codes

Package:

Divergence Analysis with Affine Constraints

2350

views

The rise of graphics processing units in high-performance computing is bringing renewed interest in code optimization techniques that target SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented our divergence analysis with affine constraints on top of Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. These experiments show that our algorithm reports 4% less divergent variables than the previous state-of-the-art algorithm of Coutinho et al. Furthermore, we can mark about one fourth of all divergent variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot’s register allocator.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 570, Optimization, Package, PTX

December 13, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Divergence Analysis with Affine Constraints

Package:

Your response

Recent source codes

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Most viewed papers (last 30 days)

Divergence Analysis with Affine Constraints

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)