8478

Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels

Apan Qasem
Department of Computer Science, Texas State University, San Marcos, Texas, USA
American Journal of Computer Architecture, 1(2), 12-20, 2012

@article{qasem2012architectural,

   title={Architectural Considerations for Compiler-guided Unroll-and-Jam of CUDA Kernels},

   author={Qasem, A.},

   journal={American Journal of Computer Architecture},

   volume={1},

   number={2},

   pages={12–20},

   year={2012},

   publisher={Scientific & Academic Publishing}

}

Download Download (PDF)   View View   Source Source   

1951

views

Hundreds of cores per chip and support for fine-grain multithreading have made GPUs a central player in todays HPC world. Much of the responsibility of achieving high performance on these complex systems lies with software like the compiler. This paper describes a compiler-based strategy for automatic and profitable application of the unroll-and-jam transformation to CUDA kernels. The framework supports specification of unroll factors through source-code annotation and also implements a heuristic based on register pressure and occupancy that recommends unroll factors for improved memory performance. We present experimental results on a GE 9800 GT on four CUDA kernels. The results show that the proposed strategy is generally able to select profitable unroll factors. The results also indicate that the selected unroll amounts strike the right balance between register pressure and occupancy.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: