Mesh Independent Loop Fusion for Unstructured Mesh Applications

hgpu.org » Applications » Computer science » Mesh Independent Loop Fusion for Unstructured Mesh Applications

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Carlo Bertolli, Adam Betts, Gihan R. Mudalige, Paul H.J. Kelly, Michael B. Giles

Department of Computing, Imperial College London

Proceedings of the 9th conference on Computing Frontiers (CF ’12), 2012

DOI:10.1145/2212908.2212917

@inproceedings{Bertolli:2012:MIL:2212908.2212917,

author={Bertolli, Carlo and Betts, Adam and Kelly, Paul H.J. and Mudalige, Gihan R. and Giles, Mike B.},

title={Mesh independent loop fusion for unstructured mesh applications},

booktitle={Proceedings of the 9th conference on Computing Frontiers},

series={CF ’12},

year={2012},

isbn={978-1-4503-1215-8},

location={Cagliari, Italy},

pages={43–52},

numpages={10},

url={http://doi.acm.org/10.1145/2212908.2212917},

doi={10.1145/2212908.2212917},

acmid={2212917},

publisher={ACM},

address={New York, NY, USA},

keywords={compilers, loop fusion, unstructured mesh applications, whole program control flow analysis}

}

Download (PDF)

View

Source

1692

views

Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis — which is often inhibited due to the size of the control flow graph – often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.

Tags: Computer science, CUDA, nVidia, Programming techniques, Tesla M2050

October 13, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org