17733

Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores

Yong-Xian Wang, Li-Lun Zhang, Wei Liu, Xing-Hua Cheng, Yu Zhuang, Anthony T. Chronopoulos
National University of Defense Technology, Changsha, Hu’nan 410073, China
arXiv:1710.09995 [cs.PF], (27 Oct 2017)

@article{wang2017performance,

   title={Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores},

   author={Wang, Yong-Xian and Zhang, Li-Lun and Liu, Wei and Cheng, Xing-Hua and Zhuang, Yu and Chronopoulos, Anthony T.},

   year={2017},

   month={oct},

   archivePrefix={"arXiv"},

   primaryClass={cs.PF}

}

Download Download (PDF)   View View   Source Source   

3044

views

For computational fluid dynamics (CFD) applications with a large number of grid points/cells, parallel computing is a common efficient strategy to reduce the computational time. How to achieve the best performance in the modern supercomputer system, especially with heterogeneous computing resources such as hybrid CPU+GPU, or a CPU + Intel Xeon Phi (MIC) co-processors, is still a great challenge. An in-house parallel CFD code capable of simulating three dimensional structured grid applications is developed and tested in this study. Several methods of parallelization, performance optimization and code tuning both in the CPU-only homogeneous system and in the heterogeneous system are proposed based on identifying potential parallelism of applications, balancing the work load among all kinds of computing devices, tuning the multi-thread code toward better performance in intra-machine node with hundreds of CPU/MIC cores, and optimizing the communication among inter-nodes, inter-cores, and between CPUs and MICs. Some benchmark cases from model and/or industrial CFD applications are tested on the Tianhe-1A and Tianhe-2 supercomputer to evaluate the performance. Among these CFD cases, the maximum number of grid cells reached 780 billion. The tuned solver successfully scales up to half of the entire Tianhe-2 supercomputer system with over 1.376 million of heterogeneous cores. The test results and performance analysis are discussed in detail.
Rating: 2.0. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: