Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems

Pei-Hung Lin
University of Minnesota
University of Minnesota, 2013


   title={Performance Portability Strategies for Computational Fluid Dynamics (CFD) Applications on HPC Systems},

   author={Lin, Pei-Hung},




Download Download (PDF)   View View   Source Source   



Achieving high computational performance on large-scale high performance computing (HPC) system demands optimizations to exploit hardware characteristics. Various optimizations and research strategies are implemented to improve performance with emphasis on single or multiple hardware characteristics. Among these approaches, the domain-specific approach involving domain expertise shows its high potential in achieving high performance and maintaining performance portability. Deep memory hierarchies, single instruction multiple data (SIMD) engines, and multiple processing cores in the latest CPUs pose many challenges to programmers seeking significant fractions of peak performance. Programming for high performance computation using modern CPUs has to address thread-level parallelization on multiple cores, data-level parallelization on SIMD engines, and optimizing memory utilization for the multi-level memories. Using multiple computational nodes with multiple CPUs in each node to scale up the computation without sacrificing performance increases programming burden significantly. As a result, performance portability has become a major challenge to programmers. It is well known that manually tuned programs can assist the compiler to deliver the best performance. However, generating these optimized codes requires deep understanding in application design, hardware architecture, compiler optimizations, and knowledge in the specific domain. Such manually tuning process has to be done for each new hardware design. To address this issue, this dissertation proposes strategies that exploit the advantages of domain-specific optimizations to achieve performance portability. This dissertation shows the combination of the proposed strategies can effectively exploit both the SIMD engine and on-chip memory. High fraction of peak performance can be achieved after such optimizations. The design of the pre-compilation framework makes it possible to automate these optimizations. Adopting the latest compiler techniques to assist domain-specific optimizations has high potential to implement sophisticated and legal transformations. This dissertation provides a preliminary study using polyhedral transformations to implement the proposed optimization strategies. Several obstacles need to be removed to make this technique applicable to large-scale scientific applications. With the research presented in this dissertation and suggested tasks in the future work, the ultimate goal to deliver performance portability with automation is feasible for CFD applications.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: