Parallelization and Performance of the NIM Weather Model for CPU, GPU and MIC Processors
NOAA Earth System Research Laboratory, 325 Broadway, Boulder, Colorado 80305
NOAA Earth System Research Laboratory, 2016
@article{govett2016parallelization,
title={Parallelization and Performance of the NIM Weather Model for CPU, GPU and MIC Processors},
author={Govett, Mark and Rosinski, Jim and Middlecoff, Jacques and Henderson, Tom and Lee, Jin and MacDonald, Alexander and Madden, Paul and Schramm, Julie and Duarte, Antonio},
year={2016}
}
The design and performance of the NIM global weather prediction model is described. NIM was designed to run on GPU and MIC processors. It demonstrates efficient parallel performance and scalability to tens of thousands of compute nodes, and has been an effective way to make comparisons between traditional CPU and emerging fine-grain processors. Design of the NIM also serves as a useful guide for finegrain parallelization of the FV3 and MPAS models, two candidates being considered by the NWS as their next global weather prediction model to replace the operational GFS. The F2C-ACC compiler, co-developed to support running the NIM on GPUs, has served as an effective vehicle to gain substantial improvements in commercial Fortran OpenACC compilers. Performance results comparing F2C-ACC with commercial GPU compilers demonstrate their increasing maturity and ability to efficiently parallelize and run next generation weather and climate prediction models. This paper describes the code structure and parallelization of NIM using F2C-ACC, and standards compliant OpenMP and OpenACC directives. NIM uses the directives to support a single, performance-portable code that runs on CPU, GPU and MIC systems. Performance results are compared for four generations of computer chips. Single and multi-node performance and scalability is also shown, along with a cost-benefit comparison based on vendor list prices.
December 6, 2016 by hgpu