Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
Performance Computing and Visualisation, Department of Computer Science, University of Warwick, UK
ACM SIGMETRICS Performance Evaluation Review – Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10), Volume 38 Issue 4, March 2011
@article{pennycook2011performance,
title={Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark},
author={Pennycook, SJ and Hammond, SD and Jarvis, SA and Mudalige, GR},
journal={ACM SIGMETRICS Performance Evaluation Review},
volume={38},
number={4},
pages={23–29},
year={2011},
publisher={ACM}
}
We present the performance analysis of a port of the LU benchmark from the NAS Parallel Benchmark (NPB) suite to NVIDIA’s Compute Unified Device Architecture (CUDA), and report on the optimisation efforts employed to take advantage of this platform. Execution times are reported for several different GPUs, ranging from low-end consumergrade products to high-end HPC-grade devices, including the Tesla C2050 built on NVIDIA’s Fermi processor. We also utilise recently developed performance models of LU to facilitate a comparison between future large-scale distributed clusters of GPU devices and existing clusters built on traditional CPU architectures, including a quad-socket, quad-core AMD Opteron cluster and an IBM BlueGene/P.
November 8, 2011 by hgpu