Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

hgpu.org » Programming » Algorithms » Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

Cedric Nugteren, Bart Mesman, Henk Corporaal

Eindhoven University of Technology, The Netherlands

ODES-8: Proceedings of the 8th Workshop on Optimizations for DSP and Embedded Systems at CGO ’10, 2010

@inproceedings{nugteren2010analyzing,

title={Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries},

author={Nugteren, C. and Mesman, B. and Corporaal, H.},

booktitle={ODES-8: Proceedings of the 8th Workshop on Optimizations for DSP and Embedded Systems at CGO’10},

year={2010}

}

Download (PDF)

View

Source

1978

views

With GPU architectures becoming increasingly important due to their large number of parallel processors, NVIDIA’s CUDA environment is becoming widely used to support general purpose applications. To efficiently use the parallel processing power, programmers need to efficiently parallelize and map their algorithms. The difficulty of this task leads to the idea to investigate CUDA’s compiler. Part of the compiler in the CUDA tool-chain is entirely undocumented, as is its output. To draw conclusions on the behaviour of this compiler, the resulting object code is reverse engineered. A visualization tool is introduced, analyzing the previously unknown compiler behaviour and proving helpful to improve the mapping process for the programmer. These improvements focus on the area of register allocation and instruction reordering. This paper describes an extension to the CUDA tool-chain, providing programmers with a visualization of register life ranges. Also, the paper presents guidelines describing how to apply optimizations in order to obtain a lower register pressure. In a case-study example, performance increases by 33% compared to already optimized CUDA code. This is achieved by optimizing the code with the help of the introduced visualization tool. Also, in 11 other case-study examples, register pressure is reduced by an average of 18%. The presented guidelines could be added to the compiler to enable a similar register pressure reduction to be achieved automatically at compile-time for new and existing CUDA programs.

Tags: Algorithms, Computer science, CUDA, nVidia, Optimization, Performance, Visualization

March 13, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org