high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Mesh Independent Loop Fusion for Unstructured Mesh Applications

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Carlo Bertolli, Adam Betts, Gihan R. Mudalige, Paul H.J. Kelly, Michael B. Giles

Department of Computing, Imperial College London

Proceedings of the 9th conference on Computing Frontiers (CF ’12), 2012

DOI:10.1145/2212908.2212917

BibTeX

Download (PDF)

View

Source

2160

views

Applications based on unstructured meshes are typically compute intensive, leading to long running times. In principle, state-of-the-art hardware, such as multi-core CPUs and many-core GPUs, could be used for their acceleration but these esoteric architectures require specialised knowledge to achieve optimal performance. OP2 is a parallel programming layer which attempts to ease this programming burden by allowing programmers to express parallel iterations over elements in the unstructured mesh through an API call, a so-called OP2-loop. The OP2 compiler infrastructure then uses source-to-source transformations to realise a parallel implementation of each OP2-loop and discover opportunities for optimisation. In this paper, we describe how several compiler techniques can be effectively utilised in tandem to increase the performance of unstructured mesh applications. In particular, we show how whole-program analysis — which is often inhibited due to the size of the control flow graph – often becomes feasible as a result of the OP2 programming model, facilitating aggressive optimisation. We subsequently show how whole-program analysis then becomes an enabler to OP2-loop optimisations. Based on this, we show how a classical technique, namely loop fusion, which is typically difficult to apply to unstructured mesh applications, can be defined at compile-time. We examine the limits of its application and show experimental results on a computational fluid dynamic application benchmark, assessing the performance gains due to loop fusion.

Tags: Computer science, CUDA, nVidia, Programming techniques, Tesla M2050

October 13, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Recent source codes

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Mesh Independent Loop Fusion for Unstructured Mesh Applications

Share this:

Recent source codes

Most viewed papers (last 30 days)