5999

APTCC: Auto Parallelizing Translator From C To CUDA

Takehiko Nawataa, Reiji Suda
Department of Computer Science, Graduate School of Information Science, the University of Tokyo
Procedia Computer Science, Volume 4, 2011, Pages 352-361, Proceedings of the International Conference on Computational Science (ICCS 2011), 2011

@article{Nawata2011352,

   title={"APTCC:AutoParallelizingTranslatorFromCToCUDA"},

   journal={"ProcediaComputerScience"},

   volume={"4"},

   number={"0"},

   pages={"352-361"},

   year={2011},

   note={"ProceedingsoftheInternationalConferenceonComputationalScience},

   issn={"1877-0509"},

   doi={"10.1016/j.procs.2011.04.037"},

   url={"http://www.sciencedirect.com/science/article/pii/S1877050911000950"},

   author={"TakehikoNawataandReijiSuda"},

   keywords={"Auto-Parallelization"}

}

Download Download (PDF)   View View   Source Source   

886

views

This paper proposes APTCC, Auto Parallelizing Translator from C to CUDA, a translator from C code to CUDA C without any directives. CUDA C is a programming language for general purpose GPU (GPGPU). CUDA C requires us a special programming manner differently from C. Although there are several pieces of research to reduce this diffculty, the result of those researches still compels us to beware of GPU architecture. It is better however that we are able to concentrate on the algorithm. Hence we propose translation of C code into CUDA C optimized to the target GPU architecture without directives, where the complexity of the GPU hardware is transparent to the programmer. In translating a C code to a CUDA C code, two questions have to be answered. The first question is how to select the code fragments which should be translated into CUDA C, and the second question is how to translate the selected code fragments into CUDA C. To the first question, this paper proposes a heuristic selection scheme based on the loop structure of the source code. The current implementation of APTCC selects nested loops for the target of translation. To the second question, APTCC translate all the statements in the body of outermost loop into CUDA C. This paper explains the detailed implementation of APTCC and compares the results of performance comparison of a few experimental input source codes.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: