high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Sustainable GPU Computing at Scale

Sustainable GPU Computing at Scale

Justin Y. Shi, Moussa Taifi, Abdallah Khreishah, Jie Wu

Dept. of Computer and Info. Sciences, Temple University, Philadelphia, PA 19122

IEEE 14th International Conference on Computational Science and Engineering (CSE), 2011

DOI:10.1109/CSE.2011.55

@article{shi2011sustainable,

title={Sustainable GPU Computing at Scale},

author={Shi, J.Y. and Taifi, M. and Khreishah, A. and Wu, J.},

year={2011}

}

Download (PDF)

View

Source

1553

views

General purpose GPU (GPGPU) computing has produced the fastest running supercomputers in the world. For continued sustainable progress, GPU computing at scale also need to address two open issues: a) how increase applications mean time between failures (MTBF) as we increase supercomputer’s component counts, and b) how to minimize unnecessary energy consumption. Since energy consumption is defined by the number of components used, we consider a sustainable high performance computing (HPC) application can allow better performance and reliability at the same time when adding computing or communication components. This paper reports a two-tier semantic statistical multiplexing framework for sustainable HPC at scale. The idea is to leverage the powers of statistic multiplexing to tame the nagging HPC scalability challenges. We include the theoretical model, sustainability analysis and computational experiments with automatic system level multiple CPU/GPU failure containment. Our results show that assuming three times slowdown of the statistical multiplexing layer, for an application using 1024 processors with 35% checkpoint overhead, the two-tier framework will produce sustained time and energy savings for MTBF less than 6 hours. With 5% checkpoint overhead, 1.5 hour MTBF would be the break even point. These results suggest the practical feasibility for the proposed two-tier framework.

Tags: Computer science, CUBLAS, CUDA, Energy-efficient computing, Fault tolerance, nVidia, OpenMPI, Tesla S1070

November 12, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Sustainable GPU Computing at Scale

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Sustainable GPU Computing at Scale

Share this:

Recent source codes

Most viewed papers (last 30 days)