high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Suejb Memeti, Lu Li, Sabri Pllana, Joanna Kolodziej, Christoph Kessler

Linnaeus University, 351 95 Vaxjo, Sweden

arXiv:1704.05316 [cs.DC], (18 Apr 2017)

@article{memeti2017benchmarking,

title={Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption},

author={Memeti, Suejb and Li, Lu and Pllana, Sabri and Kolodziej, Joanna and Kessler, Christoph},

year={2017},

month={apr},

archivePrefix={"arXiv"},

primaryClass={cs.DC}

}

Download (PDF)

View

Source

3940

views

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available performance of heterogeneous architectures may be challenging. There are various parallel programming frameworks (such as, OpenMP, OpenCL, OpenACC, CUDA) and selecting the one that is suitable for a target context is not straightforward. In this paper, we study empirically the characteristics of OpenMP, OpenACC, OpenCL, and CUDA with respect to programming productivity, performance, and energy. To evaluate the programming productivity we use our homegrown tool CodeStat, which enables us to determine the percentage of code lines that was required to parallelize the code using a specific framework. We use our tool x-MeterPU to evaluate the energy consumption and the performance. Experiments are conducted using the industry-standard SPEC benchmark suite and the Rodinia benchmark suite for accelerated computing on heterogeneous systems that combine Intel Xeon E5 Processors with a GPU accelerator or an Intel Xeon Phi co-processor.

Tags: Benchmarking, Computer science, CUDA, Energy-efficient computing, Heterogeneous systems, Intel Xeon Phi, nVidia, nVidia GeForce GTX Titan X, OpenACC, OpenCL, OpenMP, Performance

April 20, 2017 by hgpu

Rating: 3.7/5. From 9 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)