high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Comparative Study of OpenACC Implementations

A Comparative Study of OpenACC Implementations

Ruyman Reyes, Ivan Lopez, Juan J. Fumero, Francisco de Sande

Dept. de E. I. O. y Computacion Universidad de La Laguna, 38271-La Laguna, Spain

XXIII Jornadas de Paralelismo, 2012

BibTeX

Download (PDF)

View

Source

2756

views

GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task than using their predecessors, is not trivial. Obtaining performance is even harder, as it requires deep understanding of the underlying architecture. Some efforts have been directed toward the automatic code generation for GPU devices, with different results. In this work, we present a comparison between three directive-based programming models: hiCUDA, PGI Accelerator and OpenACC, using for the last our novel accULL implementation.

Tags: Code generation, Compilers, Computer science, CUDA, nVidia, OpenACC, OpenCL, OpenMP, Performance, Tesla C2050

July 23, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A Comparative Study of OpenACC Implementations

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

A Comparative Study of OpenACC Implementations

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)