Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems

Kamesh Madduri, Khaled Z. Ibrahim, Samuel Williams, Eun-Jin Im, Stephane Ethier, John Shalf, Leonid Oliker
NERSC/CRD, Lawrence Berkeley National Laboratory, Berkeley, USA
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11), 2011


   title={Gyrokinetic Toroidal Simulations on Leading Multi-and Manycore HPC Systems},

   author={Madduri, K. and Ibrahim, K.Z. and Williams, S. and Im, E.J. and Ethier, S. and Shalf, J. and Oliker, L.},



Download Download (PDF)   View View   Source Source   



The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, a PIC-based production code for studying plasma microturbulence in tokamak devices. Our optimizations encompass all six GTC sub-routines and include multi-level particle and grid decompositions designed to improve multi-node parallel scaling, particle binning for improved load balance, GPU acceleration of key subroutines, and memory-centric optimizations to improve single-node scaling and reduce memory utilization. The new hybrid MPI-OpenMP and MPI-OpenMP-CUDA GTC versions achieve up to a 2x speedup over the production Fortran code on four parallel systems — clusters based on the AMD Magny-Cours, Intel Nehalem-EP, IBM BlueGene/P, and NVIDIA Fermi architectures. Finally, strong scaling experiments provide insight into parallel scalability, memory utilization, and programmability trade-offs for large-scale gyrokinetic PIC simulations, while attaining a 1.6x speedup on 49,152 XE6 cores.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: