7293

Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries

Cedric Nugteren, Bart Mesman, Henk Corporaal
Eindhoven University of Technology, The Netherlands
ODES-8: Proceedings of the 8th Workshop on Optimizations for DSP and Embedded Systems at CGO ’10, 2010

@inproceedings{nugteren2010analyzing,

   title={Analyzing CUDA’s Compiler through the Visualization of Decoded GPU Binaries},

   author={Nugteren, C. and Mesman, B. and Corporaal, H.},

   booktitle={ODES-8: Proceedings of the 8th Workshop on Optimizations for DSP and Embedded Systems at CGO’10},

   year={2010}

}

Download Download (PDF)   View View   Source Source   

1978

views

With GPU architectures becoming increasingly important due to their large number of parallel processors, NVIDIA’s CUDA environment is becoming widely used to support general purpose applications. To efficiently use the parallel processing power, programmers need to efficiently parallelize and map their algorithms. The difficulty of this task leads to the idea to investigate CUDA’s compiler. Part of the compiler in the CUDA tool-chain is entirely undocumented, as is its output. To draw conclusions on the behaviour of this compiler, the resulting object code is reverse engineered. A visualization tool is introduced, analyzing the previously unknown compiler behaviour and proving helpful to improve the mapping process for the programmer. These improvements focus on the area of register allocation and instruction reordering. This paper describes an extension to the CUDA tool-chain, providing programmers with a visualization of register life ranges. Also, the paper presents guidelines describing how to apply optimizations in order to obtain a lower register pressure. In a case-study example, performance increases by 33% compared to already optimized CUDA code. This is achieved by optimizing the code with the help of the introduced visualization tool. Also, in 11 other case-study examples, register pressure is reduced by an average of 18%. The presented guidelines could be added to the compiler to enable a similar register pressure reduction to be achieved automatically at compile-time for new and existing CUDA programs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: