1789

APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters

Roberto Ammendola, Andrea Biagioni, Ottorino Frezza, Francesca Lo Cicero, Alessandro Lonardo, Pier Paolucci, Roberto Petronzio, Davide Rossetti, Andrea Salamon, Gaetano Salina, Francesco Simula, Nazario Tantalo, Laura Tosoratto, Piero Vicini
INFN Roma Tor Vergata
arXiv:1012.0253 [hep-lat] (1 Dec 2010)

@article{2010arXiv1012.0253A,

   author={Ammendola}, R. and {Biagioni}, A. and {Frezza}, O. and {Lo Cicero}, F. and {Lonardo}, A. and {Paolucci}, P. and {Petronzio}, R. and {Rossetti}, D. and {Salamon}, A. and {Salina}, G. and {Simula}, F. and {Tantalo}, N. and {Tosoratto}, L. and {Vicini}, P.},

   title={“{APEnet+: a 3D toroidal network enabling Petaflops scale Lattice QCD simulations on commodity clusters}”},

   journal={ArXiv e-prints},

   archivePrefix={“arXiv”},

   eprint={1012.0253},

   primaryClass={“hep-lat”},

   keywords={High Energy Physics – Lattice, Computer Science – Distributed, Parallel, and Cluster Computing},

   year={2010},

   month={dec},

   adsurl={http://adsabs.harvard.edu/abs/2010arXiv1012.0253A},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

577

views

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may easily employ more than a Petaflops of sustained computing power, requiring thousands of GPUs orchestrated with some parallel programming model. Here we describe APEnet+, the new generation of our interconnect, which scales up to tens of thousands of nodes with linear cost, thus improving the price/performance ratio on large clusters. The project target is the development of the Apelink+ host adapter featuring a low latency, high bandwidth direct network, state-of-the-art wire speeds on the links and a PCIe X8 gen2 host interface. It features hardware support for the RDMA programming model and experimental acceleration of GPU networking. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available, allowing for painless porting of standard applications. Finally, we give an insight of future work and intended developments.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: