14987

Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures

E. Phipps, M. D’Elia, H. C. Edwards, M. Hoemmen, J. Hu, S. Rajamanickam
Center for Computing Research, Sandia National Laboratories, Albuquerque, NM and Livermore, CA
arXiv:1511.03703 [cs.MS], (11 Nov 2015)

@article{phipps2015embedded,

   title={Embedded Ensemble Propagation for Improving Performance, Portability and Scalability of Uncertainty Quantification on Emerging Computational Architectures},

   author={Phipps, E. and D’Elia, M. and Edwards, H. C. and Hoemmen, M. and Hu, J. and Rajamanickam, S.},

   year={2015},

   month={nov},

   archivePrefix={"arXiv"},

   primaryClass={cs.MS}

}

Download Download (PDF)   View View   Source Source   

1297

views

Quantifying simulation uncertainties is a critical component of rigorous predictive simulation. A key component of this is forward propagation of uncertainties in simulation input data to output quantities of interest. Typical approaches involve repeated sampling of the simulation over the uncertain input data, and can require numerous samples when accurately propagating uncertainties from large numbers of sources. Often simulation processes from sample to sample are similar and much of the data generated from each sample evaluation could be reused. We explore a new method for implementing sampling methods that simultaneously propagates groups of samples together in an embedded fashion, which we call embedded ensemble propagation. We show how this approach takes advantage of properties of modern computer architectures to improve performance by enabling reuse between samples, reducing memory bandwidth requirements, improving memory access patterns, improving opportunities for fine-grained parallelization, and reducing communication costs. We describe a software technique for implementing embedded ensemble propagation based on the use of C++ templates and describe its integration with various scientific computing libraries within Trilinos. We demonstrate improved performance, portability and scalability for the approach applied to the simulation of partial differential equations on a variety of CPU, GPU, and accelerator architectures, including up to 131,072 cores on a Cray XK7 (Titan).
Rating: 1.5/5. From 2 votes.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: