APTCC: Auto Parallelizing Translator From C To CUDA
Department of Computer Science, Graduate School of Information Science, the University of Tokyo
Procedia Computer Science, Volume 4, 2011, Pages 352-361, Proceedings of the International Conference on Computational Science (ICCS 2011), 2011
@article{Nawata2011352,
title={"APTCC:AutoParallelizingTranslatorFromCToCUDA"},
journal={"ProcediaComputerScience"},
volume={"4"},
number={"0"},
pages={"352-361"},
year={2011},
note={"ProceedingsoftheInternationalConferenceonComputationalScience},
issn={"1877-0509"},
doi={"10.1016/j.procs.2011.04.037"},
url={"http://www.sciencedirect.com/science/article/pii/S1877050911000950"},
author={"TakehikoNawataandReijiSuda"},
keywords={"Auto-Parallelization"}
}
This paper proposes APTCC, Auto Parallelizing Translator from C to CUDA, a translator from C code to CUDA C without any directives. CUDA C is a programming language for general purpose GPU (GPGPU). CUDA C requires us a special programming manner differently from C. Although there are several pieces of research to reduce this diffculty, the result of those researches still compels us to beware of GPU architecture. It is better however that we are able to concentrate on the algorithm. Hence we propose translation of C code into CUDA C optimized to the target GPU architecture without directives, where the complexity of the GPU hardware is transparent to the programmer. In translating a C code to a CUDA C code, two questions have to be answered. The first question is how to select the code fragments which should be translated into CUDA C, and the second question is how to translate the selected code fragments into CUDA C. To the first question, this paper proposes a heuristic selection scheme based on the loop structure of the source code. The current implementation of APTCC selects nested loops for the target of translation. To the second question, APTCC translate all the statements in the body of outermost loop into CUDA C. This paper explains the detailed implementation of APTCC and compares the results of performance comparison of a few experimental input source codes.
October 24, 2011 by hgpu