Prospects for scalable 3D FFTs on heterogeneous exascale systems
Georgia Institute of Technology, Atlanta, GA
International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11), 2011
@article{mcclanahan2011prospects,
title={Prospects for scalable 3D FFTs on heterogeneous exascale systems},
author={McClanahan, C. and Czechowski, K. and Battaglino, C. and Iyer, K. and Yeung, PK and Vuduc, R.},
year={2011}
}
We consider the problem of implementing scalable three-dimensional fast Fourier transforms with an eye toward future exascale systems comprised of graphics co-processor (GPUs) or other similarly high-density compute units. We describe a new software implementation; derive and calibrate a suitable analytical performance model; and use this model to make predictions about potential outcomes at exascale, based on current and likely technology trends. We evaluate the scalability of our software and instantiate models on real systems, including 64 nodes (192 NVIDIA "Fermi" GPUs) of the Keeneland system at Oak Ridge National Laboratory. We use our analytical model to quantify the impact of both inter- and intra-node communication that impede further scalability. Among various observations, a key prediction is that although inter-node all-to-all communication is expected to be the bottleneck of distributed FFTs, it is actually intra-node communication that may play an even more critical role.
November 16, 2011 by hgpu