Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

hgpu.org » Applications » Computer science » Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Pier Stanislao Paolucci, Alessandro Lonardo, Davide Rossetti, Francesco Simula, Laura Tosoratto, Piero Vicini

INFN Sezione Roma Tor Vergata

arXiv:1311.1741 [cs.AR], (7 Nov 2013)

@article{2013arXiv1311.1741A,

author={Ammendola}, R. and {Biagioni}, A. and {Frezza}, O. and {Lo Cicero}, F. and {Stanislao Paolucci}, P. and {Lonardo}, A. and {Rossetti}, D. and {Simula}, F. and {Tosoratto}, L. and {Vicini}, P.},

title={"{Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1311.1741},

keywords={Computer Science – Hardware Architecture, Computer Science – Distributed, Parallel, and Cluster Computing, Physics – Computational Physics},

year={2013},

month={nov},

adsurl={http://adsabs.harvard.edu/abs/2013arXiv1311.1741A},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1873

views

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.

Tags: Computer science, FPGA, GPU cluster, Hardware Architecture, nVidia

November 8, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org