17798

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Marcio M. Pereira, Rafael C. F. Sousa, Guido Araujo
Institute of Computing, University of Campinas, UNICAMP, Brazil
13th International Workshop on OpenMP (IWOMP), 2017

@inproceedings{pereira2017compiling,

   title={Compiling and optimizing OpenMP 4. X programs to OpenCL and SPIR},

   author={Pereira, Marcio M and Sousa, Rafael CF and Araujo, Guido},

   booktitle={International Workshop on OpenMP},

   pages={48–61},

   year={2017},

   organization={Springer}

}

Given their massively parallel computing capabilities heterogeneous architectures comprised of CPUs and accelerators have been increasingly used to speed-up scientific and engineering applications. Nevertheless, programming such architectures is a challenging task for most non-expert programmers as typical accelerator programming languages (e.g. CUDA and OpenCL) demand a thoroughly understanding of the underlying hardware to enable an effective application speed-up. To achieve that, programmers are usually required to significantly change and adapt program structures and algorithms, thus impacting both performance and productivity. A simpler alternative is to use high-level directive-based programming models like OpenACC and OpenMP. These models allow programmers to insert both directives and runtime calls into existing source code, thus providing hints to the compiler and runtime to perform certain transformations and optimizations on the annotated code regions. In this paper, we present ACLang, an open-source LLVM/Clang compiler framework (http://www.aclang.org) that implements the recently released OpenMP 4.X Accelerator Programming Model. ACLang automatically converts OpenMP 4.X annotated program regions into OpenCL/SPIR kernels, while providing a set of polyhedral based optimizations like tiling and vectorization. OpenCL kernels resulting from ACLang can be executed on any OpenCL/SPIR compatible acceleration device, not only GPUs, but also FPGA accelerators like those found in the Intel HARP architecture. To the best of our knowledge and at the time this paper was written, this is the first LLVM/Clang implementation of the OpenMP 4.X Accelerator Model that provides a source-totarget OpenCL conversion. Experiments using ACLang on the Polybench benchmark reveal speed-ups of up to 30x on an Exynos 8890 Octacore CPU with a ARM Mali-T880 MP12 GPU, up to 62x on a 2.4 GHz dualcore Intel Core i5 processor equipped with an Intel Iris GPU unit, and up to 112x on a 2.1 GHz 32 cores Intel-Xeon processor equipped with a Tesla K40c GPU.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: