high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Zheng Wang, Dominik Grewe, Michael F.P. O’Boyle

Lancaster University

ACM Transaction on Architecture and Code Optimization (TACO), Volume 11, Issue 4, 2015

@article{wang2015automatic,

title={Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems},

author={Wang, Zheng and Grewe, Dominik and O’boyle, Michael FP},

journal={ACM Transactions on Architecture and Code Optimization (TACO)},

volume={11},

number={4},

pages={42},

year={2015},

publisher={ACM}

}

Download (PDF)

View

Source

1812

views

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU based systems. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) on a Core i7/NVIDIA GeForce GTX580 and a Core i7/AMD Radeon 7970 platforms, respectively over a sequential baseline. Our approach achieves, on average, over 10x speedups over two state-of-the-art automatic GPU code generators.

Tags: ATI, ATI Radeon HD 7970, Code generation, Compilers, Computer science, Heterogeneous systems, Machine learning, nVidia, nVidia GeForce GTX 580, OpenCL, OpenMP, Performance

February 19, 2016 by hgpu

Rating: 2.3/5. From 2 votes.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Share this:

Recent source codes

Most viewed papers (last 30 days)