high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Marcio M. Pereira, Rafael C. F. Sousa, Guido Araujo

Institute of Computing, University of Campinas, UNICAMP, Brazil

13th International Workshop on OpenMP (IWOMP), 2017

DOI:10.1007/978-3-319-65578-9_4

BibTeX

Download (PDF)

View

Source

Source codes

Package:

ACLang: open source LLVM Clang based compiler that implements the OpenMP Accelerator Model

5395

views

Given their massively parallel computing capabilities heterogeneous architectures comprised of CPUs and accelerators have been increasingly used to speed-up scientific and engineering applications. Nevertheless, programming such architectures is a challenging task for most non-expert programmers as typical accelerator programming languages (e.g. CUDA and OpenCL) demand a thoroughly understanding of the underlying hardware to enable an effective application speed-up. To achieve that, programmers are usually required to significantly change and adapt program structures and algorithms, thus impacting both performance and productivity. A simpler alternative is to use high-level directive-based programming models like OpenACC and OpenMP. These models allow programmers to insert both directives and runtime calls into existing source code, thus providing hints to the compiler and runtime to perform certain transformations and optimizations on the annotated code regions. In this paper, we present ACLang, an open-source LLVM/Clang compiler framework (http://www.aclang.org) that implements the recently released OpenMP 4.X Accelerator Programming Model. ACLang automatically converts OpenMP 4.X annotated program regions into OpenCL/SPIR kernels, while providing a set of polyhedral based optimizations like tiling and vectorization. OpenCL kernels resulting from ACLang can be executed on any OpenCL/SPIR compatible acceleration device, not only GPUs, but also FPGA accelerators like those found in the Intel HARP architecture. To the best of our knowledge and at the time this paper was written, this is the first LLVM/Clang implementation of the OpenMP 4.X Accelerator Model that provides a source-totarget OpenCL conversion. Experiments using ACLang on the Polybench benchmark reveal speed-ups of up to 30x on an Exynos 8890 Octacore CPU with a ARM Mali-T880 MP12 GPU, up to 62x on a 2.4 GHz dualcore Intel Core i5 processor equipped with an Intel Iris GPU unit, and up to 112x on a 2.1 GHz 32 cores Intel-Xeon processor equipped with a Tesla K40c GPU.

Tags: ARM, Code generation, Compilers, Computer science, FPGA, Heterogeneous systems, LLVM, nVidia, OpenCL, OpenMP, Package, Tesla K40

November 21, 2017 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Package:

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Compiling and Optimizing OpenMP 4.X Programs to OpenCL and SPIR

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)