Dense Matrix Computation on a Heterogenous Architecture: A Block Synchronous Approach
The University of Texas at Austin, Austin, TX 78712
TACC Technical Report TR-12-04, 2012
@TechReport{FLAWN63,
number={TR-12-04},
title={Dense Matrix Computation on a Heterogenous Architecture: A Block Synchronous Approach},
author={Kyungjoo Kim and Victor Eijkhout and Robert A. van de Geijn},
institution={Texas Advanced Computing Center, The University of Texas at Austin},
year={2012},
owner={eijkhout},
timestamp={2012.08.05}
}
We present a strategy for efficient use of all components of a heterogenous compute node of a typical current generation cluster. Such nodes often comprise multiple sockets with a multicore processor per socket and one or more accelerators, possibly from different generations and/or types. Our strategy differs from schedulers such as Quark or SuperMatrix in that it does not rely on a Directed Acyclic Graph, but rather uses a bulk-synchronous model. Also, it uses dynamic task division rather than aggregation to deal with the heterogeneous components of a node. Practical experiments show the merits of our approach.
August 13, 2012 by hgpu