7984

Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability

Atsushi Kawai, Kenji Yasuoka, Kazuyuki Yoshikawa, Tetsu Narumi
Department of Mechanical Engineering, Keio University, Yokohama, Japan
The Fourth International Conference on Future Computational Technologies and Applications (FUTURE COMPUTING), 2012

@inproceedings{kawai2012distributed,

   title={Distributed-Shared CUDA: Virtualization of Large-Scale GPU Systems for Programmability and Reliability},

   author={Kawai, A. and Yasuoka, K. and Yoshikawa, K. and Narumi, T.},

   booktitle={FUTURE COMPUTING 2012, The Fourth International Conference on Future Computational Technologies and Applications},

   pages={7–12},

   year={2012}

}

Download Download (PDF)   View View   Source Source   

2709

views

One of the difficulties for current GPGPU (General-Purpose computing on Graphics Processing Units) users is writing code to use multiple GPUs. One limiting factor is that only a few GPUs can be attached to a PC, which means that MPI (Message Passing Interface) would be a common tool to use tens or more GPUs. However, an MPI-based parallel code is sometimes complicated compared with a serial one. In this paper, we propose DS-CUDA (Distributed-Shared Compute Unified Device Architecture), a middleware to simplify the development of code that uses multiple GPUs distributed on a network. DS-CUDA provides a global view of GPUs at the source-code level. It virtualizes a cluster of GPU equipped PCs to seem like a single PC with many GPUs. Also, it provides automated redundant calculation mechanism to enhance the reliability of GPUs. The performance of Monte Carlo and many-body simulations are measured on 22-node (64-GPU) fraction of the TSUBAME 2.0 supercomputer. The results indicate that DS-CUDA is a practical solution to use tens or more GPUs.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: