Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

hgpu.org » Applications » Computer science » Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Atsushi Kawai, Kenji Yasuoka, Kazuyuki Yoshikawa, Tetsu Narumi

Department of Mechanical Engineering, Keio University, Yokohama, Japan

The Fourth International Conference on Future Computational Technologies and Applications (FUTURE COMPUTING), 2012

BibTeX

Download (PDF)

View

Source

3045

views

One of the difficulties for current GPGPU (General-Purpose computing on Graphics Processing Units) users is writing code to use multiple GPUs. One limiting factor is that only a few GPUs can be attached to a PC, which means that MPI (Message Passing Interface) would be a common tool to use tens or more GPUs. However, an MPI-based parallel code is sometimes complicated compared with a serial one. In this paper, we propose DS-CUDA (Distributed-Shared Compute Unified Device Architecture), a middleware to simplify the development of code that uses multiple GPUs distributed on a network. DS-CUDA provides a global view of GPUs at the source-code level. It virtualizes a cluster of GPU equipped PCs to seem like a single PC with many GPUs. Also, it provides automated redundant calculation mechanism to enhance the reliability of GPUs. The performance of Monte Carlo and many-body simulations are measured on 22-node (64-GPU) fraction of the TSUBAME 2.0 supercomputer. The results indicate that DS-CUDA is a practical solution to use tens or more GPUs.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GTX 580, Tesla M2050, Virtualization

July 29, 2012 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org