Dataflow-driven GPU performance projection for multi-kernel transformations
Argonne National Laboratory
International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), 2012
@inproceedings{meng2012dataflow,
title={Dataflow-driven GPU performance projection for multi-kernel transformations},
author={Meng, J. and Morozov, V.A. and Vishwanath, V. and Kumaran, K.},
booktitle={Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis},
pages={82},
year={2012},
organization={IEEE Computer Society Press}
}
Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one may greatly improve data locality. Doing so, however, requires transformations across multiple, potentially nested, parallel loops; at the same time, the original code semantics and data dependency must be preserved. Since each kernel may have distinct data access patterns, their combined dataflow can be nontrivial. As a result, the complexity of multi-kernel transformations often leads to significant effort with no guarantee of performance benefits. This paper proposes a dataflow-driven analytical framework to project GPU performance for a sequence of parallel operations. Users need only provide CPU code skeletons for a sequence of parallel loops. The framework can then automatically identify opportunities for multi-kernel transformations and data management. It is also able to project the overall performance without implementing GPU code or using physical hardware.
November 20, 2012 by hgpu