high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » On the Programmability and Performance of Heterogeneous Platforms

On the Programmability and Performance of Heterogeneous Platforms

Konstantinos Krommydas, Thomas R.W. Scogland, Wu-chun Feng

Department of Computer Science, Virginia Tech

19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013), 2013

@article{krommydas2013programmability,

title={On the Programmability and Performance of Heterogeneous Platforms},

author={Krommydas, Konstantinos and Scogland, Thomas RW and Feng, Wu-chun},

year={2013}

}

Download (PDF)

View

Source

2206

views

General-purpose computing on an ever-broadening array of parallel devices has led to an increasingly complex and multi-dimensional landscape with respect to programmability and performance optimization. The growing diversity of parallel architectures presents many challenges to the domain scientist, including device selection, programming model, and level of investment in optimization. All of these choices influence the balance between programmability and performance. In this paper, we characterize the performance achievable across a range of optimizations, along with their programmability, for multi- and many-core platforms – specifically, an Intel Sandy Bridge CPU, Intel Xeon Phi co-processor, and NVIDIA Kepler K20 GPU – in the context of an n-body, molecular-modeling application called GEM. Our systematic approach to optimization delivers implementations with speedups of 194.98x, 885.18x, and 1020.88x on the CPU, Xeon Phi, and GPU, respectively, over the naive serial version. Beyond the speed-ups, we characterize the incremental optimization of the code from naive serial to fully hand-tuned on each platform through four distinct phases of increasing complexity to expose the strengths and weaknesses of the programming models offered by each platform.

Tags: Computer science, CUDA, Heterogeneous systems, Intel Xeon Phi, N-body simulation, nVidia, OpenACC, Optimization, Tesla K20

January 29, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

On the Programmability and Performance of Heterogeneous Platforms

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

On the Programmability and Performance of Heterogeneous Platforms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)