Acceleration of a Full-scale Industrial CFD Application with OP2
Oxford e-Research Centre, University of Oxford
arXiv:1403.7209 [cs.CE], (27 Mar 2014)
@article{2014arXiv1403.7209R,
author={Reguly}, I.~Z. and {Mudalige}, G.~R. and {Bertolli}, C. and {Giles}, M.~B. and {Betts}, A. and {Kelly}, P.~H.~J. and {Radford}, D.},
title={"{Acceleration of a Full-scale Industrial CFD Application with OP2}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1403.7209},
primaryClass={"cs.CE"},
keywords={Computer Science – Computational Engineering, Finance, and Science, Computer Science – Performance, C.4},
year={2014},
month={mar},
adsurl={http://adsabs.harvard.edu/abs/2014arXiv1403.7209R},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc. It consists of over 300 parallel loops with a code base exceeding 50K lines and is capable of performing complex simulations over highly detailed unstructured mesh geometries. Unlike simpler structured-mesh applications, which feature high speed-ups when accelerated by modern processor architectures, such as multi-core and many-core processor systems, Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework. OP2 targets the domain of unstructured mesh problems and follows the design of an active library using source-to-source translation and compilation to generate multiple parallel implementations from a single high-level application source for execution on a range of back-end hardware platforms. We chart the conversion of Hydra from its original hand-tuned production version to one that utilizes OP2, and map out the key difficulties encountered in the process. To our knowledge this research presents the first application of such a high-level framework to a full scale production code. Specifically we show (1) how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application such as Hydra, and (2) how different optimizations targeting contrasting parallel architectures can be applied to the whole application, seamlessly, reducing developer effort and increasing code longevity. Performance results demonstrate that not only the same runtime performance as that of the hand-tuned original production code could be achieved, but it can be significantly improved on conventional processor systems. Additionally, we achieve further acceleration by exploiting many-core parallelism, particularly on GPU systems. Our results provide evidence of how high-level frameworks such as OP2 enable portability across a wide range of contrasting platforms and their significant utility in achieving near-optimal performance without the intervention of the application programmer.
April 7, 2014 by hgpu