7584

A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

Paolo D’Alberto
FastMMW, CA, USA
arXiv:1205.2927v1 [cs.MS] (14 May 2012)

@article{2012arXiv1205.2927D,

   author={D’Alberto}, P.},

   title={"{A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply}"},

   journal={ArXiv e-prints},

   archivePrefix={"arXiv"},

   eprint={1205.2927},

   primaryClass={"cs.MS"},

   keywords={Computer Science – Mathematical Software, G.4},

   year={2012},

   month={may},

   adsurl={http://adsabs.harvard.edu/abs/2012arXiv1205.2927D},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

1195

views

As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional graphics processing units (GPUs) onto a single motherboard. These APU processors provide multiple symmetric cores with their memory hierarchies and an integrated GPU. Moreover, these processors are designed to work with external GPUs that can push the peak performance towards the TeraFLOPS boundary. We present a case study for the development of dense Matrix Multiplication (MM) codes for matrix sizes up to 19Ktimes19K, thus using all of the above computational engines, and an achievable peak performance of 200 GFLOPS for, literally, a made- at-home built. We present the results of our experience, the quirks, the pitfalls, the achieved performance, and the achievable peak performance.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: