Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors
CNRS, LIG UMR 5217, Grenoble Alps University, F-38058 Grenoble, France
hal-01652614, (30 November 2017)
@inproceedings{ho2017improving,
title={Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors},
author={Ho, Minh-Quan and Obrecht, Christian and Tourancheau, Bernard and de Dinechin, Beno{^i}t Dupont and Hascoet, Julien},
booktitle={36th IEEE International Performance Computing and Communications Conference (IPCCC 2017)},
year={2017}
}
CPU-based many-core processors present an alternative to multicore CPU and GPU processors. In particular, the 93-Petaflops Sunway supercomputer, built from clustered many-core processors, has opened a new era for high performance computing that does not rely on GPU acceleration. However, memory bandwidth remains the main challenge for these architectures. This motivates our endeavor for optimizing one of the most data-intensive kind of stencil computations, namely the three-dimensional applications of the lattice Boltzmann method (LBM). We propose optimizations on many-cores processors by using local memory and asynchronous software-prefetching on a representative 3D LBM solver as an example. We achieve 33% performance gain on the Kalray MPPA-256 manycore processor by actively streaming data from/to local memory, compared to the "passive" OpenCL programming model.
December 19, 2017 by hgpu