Performance characterization of data-intensive kernels on AMD Fusion architectures

hgpu.org » Applications » Computer science » Performance characterization of data-intensive kernels on AMD Fusion architectures

Performance characterization of data-intensive kernels on AMD Fusion architectures

Kenneth Lee, Heshan Lin, Wu-chun Feng

Department of Computer Science, Virginia Tech, Blacksburg, VA, USA

Computer Science – Research and Development, 2012

DOI:10.1007/s00450-012-0209-1

BibTeX

Download (PDF)

View

Source

2106

views

The cost of data movement over the PCI Express bus is one of the biggest performance bottlenecks for accelerating data-intensive applications on traditional discrete GPU architectures. To address this bottleneck, AMD Fusion introduces a fused architecture that tightly integrates the CPU and GPU onto the same die and connects them with a high-speed, on-chip, memory controller. This novel architecture incorporates shared memory between the CPU and GPU, thus enabling several techniques for inter-device data transfer that are not available on discrete architectures. For instance, a kernel running on the GPU can now directly access a CPU-resident memory buffer and vice versa. In this paper, we seek to understand the implications of the fused architecture on CPU-GPU heterogeneous computing by systematically characterizing various memory-access techniques instantiated with diverse memory-bound kernels on the latest AMD Fusion system (i.e., Llano A8-3850). Our study reveals that the fused architecture is very promising for accelerating data-intensive applications on heterogeneous platforms in support of supercomputing.

Tags: AMD Fusion, ATI, ATI Radeon HD 5870, Computer science, Heterogeneous systems, OpenCL

September 30, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Performance characterization of data-intensive kernels on AMD Fusion architectures

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Performance characterization of data-intensive kernels on AMD Fusion architectures

Share this:

Recent source codes

Most viewed papers (last 30 days)