A Multi-Stage CUDA Kernel for Floyd-Warshall

Ben Lund, Justin W Smith
University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221
arXiv:1001.4108 [cs.DC] (25 Feb 2010)


   author={Lund}, B. and {Smith}, J.~W},

   title={“{A Multi-Stage CUDA Kernel for Floyd-Warshall}”},

   journal={ArXiv e-prints},




   keywords={Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, D.1.3},




   adsnote={Provided by the SAO/NASA Astrophysics Data System}


Download Download (PDF)   View View   Source Source   



We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
No votes yet.
Please wait...

* * *

* * *

Featured events

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: