4131

Performance Analysis of a Hybrid MPI/CUDA Implementation of the NAS-LU Benchmark

S.J. Pennycook, S.D. Hammond, S.A. Jarvis, G.R. Mudalige
Performance Computing and Visualisation, Department of Computer Science, University of Warwick, UK
ACM SIGMETRICS Performance Evaluation Review, Volume 38 Issue 4, March 2011

@article{pennycook2011performance,

   title={Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark},

   author={Pennycook, SJ and Hammond, SD and Jarvis, SA and Mudalige, GR},

   journal={ACM SIGMETRICS Performance Evaluation Review},

   volume={38},

   number={4},

   pages={23–29},

   year={2011},

   publisher={ACM}

}

Download Download (PDF)   View View   Source Source   

594

views

The emergence of Graphics Processing Units (GPUs) as a potential alternative to conventional general-purpose processors has led to significant interest in these architectures by both the academic community and the High Performance Computing (HPC) industry. While GPUs look likely to deliver unparalleled levels of performance, the publication of studies claiming performance improvements in excess of 30,000x are misleading. Significant on-node performance improvements have been demonstrated for code kernels and algorithms amenable to GPU acceleration; studies demonstrating comparable results for full scientific applications requiring multiple-GPU architectures are rare. In this paper we present an analysis of a port of the NASLU benchmark to NVIDIA’s Compute Unified Device Architecture (CUDA) – the most stable GPU programming model currently available. Our solution is also extended to multiple nodes and multiple GPU devices. Runtime performance on several GPUs is presented, ranging from low-end, consumer-grade cards such as the 8400GS to NVIDIA’s agship Fermi HPC processor found in the recently released C2050. We compare the runtimes of these devices to several processors including those from Intel, AMD and IBM. In addition to this we utilise a recently developed performance model of LU. With this we predict the runtime performance of LU on large-scale distributed GPU clusters, which are predicted to become commonplace in future high-end HPC architectural solutions.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: