RDMA-Based Job Migration Framework for MPI over InfiniBand

Xiangyong Ouyang, Sonya Marcarelli, Raghunath Rajach, K. P. Dhabaleswar
Department of Computer Science and Engineering, The Ohio State University
IEEE International Conference on Cluster Computing, 2010, pp.116-125


   title={RDMA-Based Job Migration Framework for MPI over InfiniBand},

   author={Ouyang, X. and Marcarelli, S. and Rajachandrasekar, R. and Panda, D.K.},

   booktitle={2010 IEEE International Conference on Cluster Computing},





Download Download (PDF)   View View   Source Source   



Coordinated checkpoint and recovery is a common approach to achieve fault tolerance on large-scale systems. The traditional mechanism dumps the process image to a local disk or a central storage area of all the processes involved in the parallel job. When a failure occurs, the processes are restarted and restored to the latest checkpoint image. However, this kind of approach is unable to provide the scalability required by increasingly largesized jobs, since it puts heavy I/O burden on the storage subsystem, and resubmitting a job during restart phase incurs lengthy queuing delay. In this paper, we enhance the fault tolerance of MVA-PICH2 [1], an open-source high performance MPI-2 implementation, by using a proactive job migration scheme. Instead of checkpointing all the processes of the job and saving their process images to a stable storage, we transfer the processes running on a health-deteriorating node to a healthy spare node, and resume these processes from the spare node. RDMA-based process image transmission is designed to take advantage of high performance communication in InfiniBand. Experimental results show that the Job Migration scheme can achieve a speedup of 4.49 times over the Checkpoint/Restart scheme to handle a node failure for a 64-process application running on 8 compute nodes. To the best of our knowledge, this is the first such job migration design for InfiniBand-based clusters.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: