high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Mayank Daga, Ashwin M. Aji, Wu-chun Feng

Dept. of Computer Science, Virginia Tech, Blacksburg, USA

Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 2011

DOI:10.1109/SAAHPC.2011.29

@inproceedings{daga2011efficacy,

title={On the Efficacy of a Fused CPU+ GPU Processor (or APU) for Parallel Computing},

author={Daga, M. and Aji, A.M. and Feng, W.},

booktitle={Application Accelerators in High-Performance Computing (SAAHPC), 2011 Symposium on},

pages={141–149},

year={2011},

organization={IEEE}

}

Download (PDF)

View

Source

1753

views

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers between the CPU and GPU over PCIe. Emerging heterogeneous computing architectures that "fuse" the functionality of the CPU and GPU, e.g., AMD Fusion and Intel Knights Ferry, hold the promise of addressing the PCIe bottleneck. In this paper, we empirically characterize and analyze the efficacy of AMD Fusion, an architecture that combines general-purposex86 cores and programmable accelerator cores on the same silicon die. We characterize its performance via a set of micro-benchmarks (e.g., PCIe data transfer), kernel benchmarks(e.g., reduction), and actual applications (e.g., molecular dynamics). Depending on the benchmark, our results show that Fusion produces a 1.7 to 6.0-fold improvement in the data-transfer time, when compared to a discrete GPU. In turn, this improvement in data-transfer performance can significantly enhance application performance. For example, running a reduction benchmark on AMD Fusion with its mere 80 GPU cores improves performance by 3.5-fold over the discrete AMD Radeon HD 5870 GPU with its 1600 more powerful GPU cores.

Tags: APU, ATI, ATI Radeon HD 5450, ATI Radeon HD 5870, Benchmarking, Computer science, Heterogeneous systems, OpenCL, Performance

October 11, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Share this:

Recent source codes

Most viewed papers (last 30 days)