GPU peer-to-peer techniques applied to a cluster interconnect

hgpu.org » Programming » CUDA » GPU peer-to-peer techniques applied to a cluster interconnect

GPU peer-to-peer techniques applied to a cluster interconnect

Roberto Ammendola, Massimo Bernaschi, Andrea Biagioni, Mauro Bisson, Massimiliano Fatica, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Enrico Mastrostefano, Pier Stanislao Paolucci, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata, Rome, Italy

arXiv:1307.8276 [physics.comp-ph], (31 Jul 2013)

@article{2013arXiv1307.8276A,

author={Ammendola}, R. and {Bernaschi}, M. and {Biagioni}, A. and {Bisson}, M. and {Fatica}, M. and {Frezza}, O. and {Lo Cicero}, F. and {Lonardo}, A. and {Mastrostefano}, E. and {Stanislao Paolucci}, P. and {Rossetti}, D. and {Simula}, F. and {Tosoratto}, L. and {Vicini}, P.},

title={"{GPU peer-to-peer techniques applied to a cluster interconnect}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1307.8276},

primaryClass={"physics.comp-ph"},

keywords={Physics – Computational Physics, Computer Science – Distributed, Parallel, and Cluster Computing},

year={2013},

month={jul},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1307.8276A},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1662

views

Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the architectural modifications required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class GPUs on an FPGA-based cluster interconnect. Besides, the current software implementation, which integrates this feature by minimally extending the RDMA programming model, is discussed, as well as some issues raised while employing it in a higher level API like MPI. Finally, the current limits of the technique are studied by analyzing the performance improvements on low-level benchmarks and on two GPU-accelerated applications, showing when and how they seem to benefit from the GPU peer-to-peer method.

Tags: Computational Physics, CUDA, FPGA, MPI, nVidia, Physics, Tesla S2075

August 1, 2013 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org