APTCC: Auto Parallelizing Translator From C To CUDA

hgpu.org » Applications » Computer science » APTCC: Auto Parallelizing Translator From C To CUDA

APTCC: Auto Parallelizing Translator From C To CUDA

Takehiko Nawataa, Reiji Suda

Department of Computer Science, Graduate School of Information Science, the University of Tokyo

Procedia Computer Science, Volume 4, 2011, Pages 352-361, Proceedings of the International Conference on Computational Science (ICCS 2011), 2011

DOI:10.1016/j.procs.2011.04.037

@article{Nawata2011352,

title={"APTCC:AutoParallelizingTranslatorFromCToCUDA"},

journal={"ProcediaComputerScience"},

volume={"4"},

number={"0"},

pages={"352-361"},

year={2011},

note={"ProceedingsoftheInternationalConferenceonComputationalScience},

issn={"1877-0509"},

doi={"10.1016/j.procs.2011.04.037"},

url={"http://www.sciencedirect.com/science/article/pii/S1877050911000950"},

author={"TakehikoNawataandReijiSuda"},

keywords={"Auto-Parallelization"}

}

Download (PDF)

View

Source

2381

views

This paper proposes APTCC, Auto Parallelizing Translator from C to CUDA, a translator from C code to CUDA C without any directives. CUDA C is a programming language for general purpose GPU (GPGPU). CUDA C requires us a special programming manner differently from C. Although there are several pieces of research to reduce this diffculty, the result of those researches still compels us to beware of GPU architecture. It is better however that we are able to concentrate on the algorithm. Hence we propose translation of C code into CUDA C optimized to the target GPU architecture without directives, where the complexity of the GPU hardware is transparent to the programmer. In translating a C code to a CUDA C code, two questions have to be answered. The first question is how to select the code fragments which should be translated into CUDA C, and the second question is how to translate the selected code fragments into CUDA C. To the first question, this paper proposes a heuristic selection scheme based on the loop structure of the source code. The current implementation of APTCC selects nested loops for the target of translation. To the second question, APTCC translate all the statements in the body of outermost loop into CUDA C. This paper explains the detailed implementation of APTCC and compares the results of performance comparison of a few experimental input source codes.

Tags: Code generation, Compilers, Computer science, CUDA, nVidia, nVidia GeForce GTX 470, nVidia GeForce GTX 580, Tesla C2050

October 24, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org