An Efficient, Automatic Approach to High Performance Heterogeneous Computing
Circuits and Systems Group, Department of Electrical and Electronic Engineering at Imperial College London
arXiv:1505.04417 [cs.DC], (17 May 2015)
@article{inggs2015efficient,
title={An Efficient, Automatic Approach to High Performance Heterogeneous Computing},
author={Inggs, Gordon and Thomas, David B. and Luk, Wayne},
year={2015},
month={may},
archivePrefix={"arXiv"},
primaryClass={cs.DC}
}
Users of heterogeneous computing systems face two problems: firstly, understanding the trade-off relationship between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed resources efficiently. A domain specific approach addresses both of these problems. By considering a subset of operations, models of the observable characteristics or domain metrics may be formulated in advance, and populated at runtime for particular problem instances. These metric models can then be used to express the allocation of work as a formal integer linear programming problem, which can be solved using heuristics, numerical method-based optimisers or constrained optimisation frameworks. These claims are illustrated using the example domain of derivatives pricing in computational finance, with the domain metrics of workload latency or makespan and pricing accuracy. For a large, varied workload of 128 Black-Scholes and Heston model-based option pricing tasks, running upon a diverse array of 16 Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both the makespan and accuracy are generally within 10% of the performance actually seen at runtime. When these models are used as inputs to numerical optimiser and formal optimisation-based workload partitioning approaches, a latency improvement of up to 24 and 270 times over a naive heuristic approach is seen.
May 20, 2015 by hgpu