2524

A Multi-Stage CUDA Kernel for Floyd-Warshall

Ben Lund, Justin W Smith
University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221
arXiv:1001.4108 [cs.DC] (25 Feb 2010)

@article{2010arXiv1001.4108L,

   author={Lund}, B. and {Smith}, J.~W},

   title={“{A Multi-Stage CUDA Kernel for Floyd-Warshall}”},

   journal={ArXiv e-prints},

   archivePrefix={“arXiv”},

   eprint={1001.4108},

   primaryClass={“cs.DC”},

   keywords={Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, D.1.3},

   year={2010},

   month={jan},

   adsurl={http://adsabs.harvard.edu/abs/2010arXiv1001.4108L},

   adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download Download (PDF)   View View   Source Source   

792

views

We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: