Characterization and Transformation of Unstructured Control Flow in Bulk Synchronous GPU Applications
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
International Journal of High Performance Computing Applications, 1094342011434814, 2012
@article{wu2012characterization,
title={Characterization and transformation of unstructured control flow in bulk synchronous GPU applications},
author={Wu, H. and Diamos, G. and Wang, J. and Li, S. and Yalamanchili, S.},
journal={International Journal of High Performance Computing Applications},
year={2012},
publisher={SAGE Publications}
}
In this paper we identify important classes of program control flows in applications targeted to commercially available graphics processing units (GPUs) and characterize their presence in real workloads such as those that occur in CUDA and OpenCL. Broadly, control flow can be characterized as structured or unstructured. It is shown that most existing techniques for handling divergent control in bulk synchronous GPU applications handle structured control flow efficiently, some are incapable of executing unstructured control flow directly, and none handles unstructured control flow efficiently. An approach to reduce the impact of this problem is provided. An unstructured-to-structured control flow transformation for CUDA kernels is implemented and its performance impact on a large class of GPU applications is assessed. The results quantify the importance of improving support for programs with unstructured control flow on GPUs. The transformation can also be used in a JIT compiler pass to execute programs with unstructured control flow on the GPU devices that do not support unstructured control flow. This is an important capability for execution portability of applications using GPU accelerators.
May 12, 2012 by hgpu