26767

Productive Performance Engineering for Weather and Climate Modeling with Python

Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, Torsten Hoefler
Department of Computer Science, ETH Zurich, Switzerland
arXiv:2205.04148 [cs.DC], (9 May 2022)

@misc{https://doi.org/10.48550/arxiv.2205.04148,

   doi={10.48550/ARXIV.2205.04148},

   url={https://arxiv.org/abs/2205.04148},

   author={Ben-Nun, Tal and Groner, Linus and Deconinck, Florian and Wicky, Tobias and Davis, Eddie and Dahm, Johann and Elbert, Oliver and George, Rhea and McGibbon, Jeremy and Trümper, Lukas and Wu, Elynn and Fuhrer, Oliver and Schulthess, Thomas and Hoefler, Torsten},

   keywords={Distributed, Parallel, and Cluster Computing (cs.DC), FOS: Computer and information sciences, FOS: Computer and information sciences},

   title={Productive Performance Engineering for Weather and Climate Modeling with Python},

   publisher={arXiv},

   year={2022},

   copyright={arXiv.org perpetual, non-exclusive license}

}

Download Download (PDF)   View View   Source Source   

1225

views

Earth system models are developed with a tight coupling to target hardware, often containing highly-specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. In this work, we present a detailed account of optimizing the Finite Volume Cubed-Sphere (FV3) weather model, improving productivity and performance. By using a declarative Python-embedded stencil DSL and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local optimization and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we achieve speedups of up to 3.92x using GPUs over the tuned production implementation at a fraction of the original code.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: