8051

OP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and Many-Core Architectures

G.R. Mudalige, M.B. Giles, I. Reguly, C. Bertolli, P.H.J Kelly
Oxford eResearch Centre, University of Oxford, United Kingdom
Innovative Parallel Computing conference (InPar ’12), 2012

@article{mudalige2012op2,

   title={OP2: An Active Library Framework for Solving Unstructured Mesh-based Applications on Multi-Core and Many-Core Architectures},

   author={Mudalige, GR and Giles, MB and Reguly, I. and Bertolli, C. and Kelly, PHJ},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

1620

views

OP2 is an "active" library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into different parallel implementations for execution on different back-end hardware platforms. In this paper we present the design of the current OP2 library, and investigate its capabilities in achieving performance portability, near-optimal performance, and scaling on modern multi-core and many-core processor based systems. A key feature of this work is OP2’s recent extension facilitating the development and execution of applications on a distributed memory cluster of GPUs. We discuss the main design issues in parallelizing unstructured mesh based applications on heterogeneous platforms. These include handling data dependencies in accessing indirectly referenced data, the impact of unstructured mesh data layouts (array of structs vs. struct of arrays) and design considerations in generating code for execution on a cluster of GPUs. A representative CFD application written using the OP2 framework is utilized to provide a contrasting benchmarking and performance analysis study on a range of multicore/many-core systems. These include multi-core CPUs from Intel (Westmere and Sandy Bridge) and AMD (Magny-Cours), GPUs from NVIDIA (GTX560Ti, Tesla C2070), a distributed memory CPU cluster (Cray XE6) and a distributed memory GPU cluster (Tesla C2050 GPUs with InfiniBand). OP2’s design choices are explored with quantitative insights into their contributions to performance. We demonstrate that an application written once at a high-level using the OP2 API can be easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: