Parallel Breadth First Search on GPU Clusters
SYSTAP, LLC. 4501 Tower Road, Greensboro, NC 27410, USA
Scientific Computing and Imaging Institute, University of Utah, UUSCI-2014-002, 2014
@article{fu2014parallel,
title={Parallel Breadth First Search on GPU Clusters},
author={Fu, Zhisong and Dasari, Harish Kumar and Berzins, Martin and Thompson, Bryan},
year={2014}
}
Fast, scalable, low-cost, and low-power execution of parallel graph algorithms is important for a wide variety of commercial and public sector applications. Breadth First Search (BFS) imposes an extreme burden on memory bandwidth and network communications and has been proposed as a benchmark that may be used to evaluate current and future parallel computers. Hardware trends and manufacturing limits strongly imply that many core devices, such as NVIDIA GPUs and the Intel Xeon Phi, will become central components of such future systems. GPUs are well known to deliver the highest FLOPS/watt and enjoy a very significant memory bandwidth advantage over CPU architectures. Recent work has demonstrated that GPUs can deliver high performance for parallel graph algorithms and, further, that it is possible to encapsulate that capability in a manner that hides the low level details of the GPU architecture and the CUDA language but preserves the high throughput of the GPU. We extend previous research on GPUs and on scalable graph processing on super-computers and demonstrate that a high-performance parallel graph machine can be created using commodity GPUs and networking hardware.
August 11, 2014 by hgpu