Mesh Independent Loop Fusion for Unstructured Mesh Applications
Department of Computing, Imperial College London
Proceedings of the 9th conference on Computing Frontiers (CF ’12), 2012
@inproceedings{Bertolli:2012:MIL:2212908.2212917,
author={Bertolli, Carlo and Betts, Adam and Kelly, Paul H.J. and Mudalige, Gihan R. and Giles, Mike B.},
title={Mesh independent loop fusion for unstructured mesh applications},
booktitle={Proceedings of the 9th conference on Computing Frontiers},
series={CF ’12},
year={2012},
isbn={978-1-4503-1215-8},
location={Cagliari, Italy},
pages={43–52},
numpages={10},
url={http://doi.acm.org/10.1145/2212908.2212917},
doi={10.1145/2212908.2212917},
acmid={2212917},
publisher={ACM},
address={New York, NY, USA},
keywords={compilers, loop fusion, unstructured mesh applications, whole program control flow analysis}
}
Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis — which is often inhibited due to the size of the control flow graph – often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.
October 13, 2012 by hgpu