8347

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Carlo Bertolli, Adam Betts, Gihan R. Mudalige, Paul H.J. Kelly, Michael B. Giles
Department of Computing, Imperial College London
Proceedings of the 9th conference on Computing Frontiers (CF ’12), 2012
@inproceedings{Bertolli:2012:MIL:2212908.2212917,

   author={Bertolli, Carlo and Betts, Adam and Kelly, Paul H.J. and Mudalige, Gihan R. and Giles, Mike B.},

   title={Mesh independent loop fusion for unstructured mesh applications},

   booktitle={Proceedings of the 9th conference on Computing Frontiers},

   series={CF ’12},

   year={2012},

   isbn={978-1-4503-1215-8},

   location={Cagliari, Italy},

   pages={43–52},

   numpages={10},

   url={http://doi.acm.org/10.1145/2212908.2212917},

   doi={10.1145/2212908.2212917},

   acmid={2212917},

   publisher={ACM},

   address={New York, NY, USA},

   keywords={compilers, loop fusion, unstructured mesh applications, whole program control flow analysis}

}

Download Download (PDF)   View View   Source Source   

324

views

Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis — which is often inhibited due to the size of the control flow graph – often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Like us on Facebook

HGPU group

167 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1275 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: