high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimal loop unrolling for GPGPU programs

Optimal loop unrolling for GPGPU programs

Giridhar S. Murthy, Mahesh Ravishankar, Muthu M. Baskaran, P. Sadayappan

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (April 2010), pp. 1-11.

DOI:10.1109/IPDPS.2010.5470423

@conference{murthy2010optimal,

title={Optimal loop unrolling for GPGPU programs},

author={Murthy, G.S. and Ravishankar, M. and Baskaran, M.M. and Sadayappan, P.},

booktitle={Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on},

pages={1–11},

issn={1530-2075},

year={2010},

organization={IEEE}

}

Source

2293

views

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA’s CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs. We develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors. The optimized versions run up to 70 percent faster than the unoptimized versions.

Tags: Computer science, CUDA, nVidia, nVidia GeForce 8800 GTX, nVidia GeForce GTX 280, OpenCL, Review

November 3, 2010 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Optimal loop unrolling for GPGPU programs

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Optimal loop unrolling for GPGPU programs

Share this:

Recent source codes

Most viewed papers (last 30 days)