A Multi-Stage CUDA Kernel for Floyd-Warshall
University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221
arXiv:1001.4108 [cs.DC] (25 Feb 2010)
@article{2010arXiv1001.4108L,
author={Lund}, B. and {Smith}, J.~W},
title={“{A Multi-Stage CUDA Kernel for Floyd-Warshall}”},
journal={ArXiv e-prints},
archivePrefix={“arXiv”},
eprint={1001.4108},
primaryClass={“cs.DC”},
keywords={Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, D.1.3},
year={2010},
month={jan},
adsurl={http://adsabs.harvard.edu/abs/2010arXiv1001.4108L},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
January 18, 2011 by hgpu