CRINK: Automatic CUDA code generation for affine C programs
Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur
Indian Institute of Technology, 2014
@phdthesis{singh2014crink,
title={CRINK: Automatic CUDA code generation for affine C programs},
author={Singh, Akanksha},
year={2014},
school={Indian Institute of Technology Kanpur}
}
Parallel programming has largely evolved as an efficient solution to a large number of compute intensive applications. Graphics Processing Unit (GPUs), traditionally designed to process computer graphics, are now widely applied to process large chunks of data parallely in many computationally expensive applications. While developing parallel programs to run on parallel computing platforms, such as CUDA, OpenCL, etc. requires knowledge of platform-specific concepts, it becomes very convenient if the process of parallelizing compute intensive sections of the program can be automated. We develop a tool CRINK, an end-to-end code transformation system, to convert sequential C programs to their parallel counterparts in CUDA. CRINK targets to parallelize the expensive sections (sections within loops) of the program while converting C programs to CUDA C programs. It incorporates handling of both irregular and regular kernels. We use concepts of Cycle Shrinking and Extended Cycle Shrinking for parallelism extractions and loop transformations. To analyse the performance, we run CRINK over the expensive sections taken from ZERO RC, SPEC, SANDIA RULES, Treepack and Higbie standard benchmarks. Analysis is done over 66 varied configurations of the benchmarks and datasets where we observe that drastic drops in computation times are achieved as the number of threads are increased while execution of the code transformed by CRINK.
August 10, 2015 by hgpu