Performance characterization of data-intensive kernels on AMD Fusion architectures
Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
Computer Science – Research and Development, 2012
@article{springerlink:10.1007/s00450-012-0209-1,
author={Lee, Kenneth and Lin, Heshan and Feng, Wu-chun},
affiliation={Department of Computer Science, Virginia Tech, Blacksburg, VA, USA},
title={Performance characterization of data-intensive kernels on AMD Fusion architectures},
journal={Computer Science – Research and Development},
publisher={Springer Berlin / Heidelberg},
issn={1865-2034},
keyword={Computer Science},
pages={1-10},
url={http://dx.doi.org/10.1007/s00450-012-0209-1},
note={10.1007/s00450-012-0209-1},
year={2012}
}
The cost of data movement over the PCI Express bus is one of the biggest performance bottlenecks for accelerating data-intensive applications on traditional discrete GPU architectures. To address this bottleneck, AMD Fusion introduces a fused architecture that tightly integrates the CPU and GPU onto the same die and connects them with a high-speed, on-chip, memory controller. This novel architecture incorporates shared memory between the CPU and GPU, thus enabling several techniques for inter-device data transfer that are not available on discrete architectures. For instance, a kernel running on the GPU can now directly access a CPU-resident memory buffer and vice versa. In this paper, we seek to understand the implications of the fused architecture on CPU-GPU heterogeneous computing by systematically characterizing various memory-access techniques instantiated with diverse memory-bound kernels on the latest AMD Fusion system (i.e., Llano A8-3850). Our study reveals that the fused architecture is very promising for accelerating data-intensive applications on heterogeneous platforms in support of supercomputing.
September 30, 2012 by hgpu