high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Yichao Wang, Qiang Qin, Simon Chong Wee SEE, James Lin

Center for High Performance Computing, Shanghai Jiao Tong University, Shanghai 200240, China

HPC China, 2013

@article{wang2013performance,

title={Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler},

author={Wang, Yichao and Qin, Qiang and SEE, Simon Chong Wee and Lin, James},

year={2013}

}

Download (PDF)

View

Source

2974

views

OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives. Since OpenACC can generate OpenCL and CUDA code, meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler, it is attractive to using OpenACC on hardwares with different underlying microarchitectures. This paper studies how realistic it is to use a single OpenACC source code for a set of hardwares with different underlying micro-architectures. Intel Knight Corner and Nvidia Kepler products are the targets in the experiment, since they are with the latest architectures and have similar peak performance. Meanwhile CAPS OpenACC compiler is used to compile EPCC OpenACC benchmark suite, Stream and MaxFlops of SHOC benchmarks to access the peformance. To study the performance portability, roofline model and relative performance model are built by the data of experiments. This paper shows that at most 82% performance compared with peak performance on Kepler and Knight Corner is achieved by specific benchmarks, but as the rise of arithmetic intensity the average performance is approximately 10%. And there is a big performance gap between Intel Knight Corner and Nvidia Kepler on several benchmarks. This study confirms that performance portability of OpenACC is related to the arithmetic intensity and a big performance gap still exsits in specific benchmarks between different hardware platforms.

Tags: Computer science, Heterogeneous systems, Intel Phi, nVidia, OpenACC, Tesla K20

October 4, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler

Share this:

Recent source codes

Most viewed papers (last 30 days)