A Multi-Stage CUDA Kernel for Floyd-Warshall
University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221
arXiv:1001.4108 [cs.DC] (25 Feb 2010)
We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
January 18, 2011 by hgpu