high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Multi-Stage CUDA Kernel for Floyd-Warshall

A Multi-Stage CUDA Kernel for Floyd-Warshall

Ben Lund, Justin W Smith

University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221

arXiv:1001.4108 [cs.DC] (25 Feb 2010)

@article{2010arXiv1001.4108L,

author={Lund}, B. and {Smith}, J.~W},

title={“{A Multi-Stage CUDA Kernel for Floyd-Warshall}”},

journal={ArXiv e-prints},

archivePrefix={“arXiv”},

eprint={1001.4108},

primaryClass={“cs.DC”},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, D.1.3},

year={2010},

month={jan},

adsurl={http://adsabs.harvard.edu/abs/2010arXiv1001.4108L},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

View

Source

2331

views

We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.

Tags: Computer science, CUDA, nVidia, Optimization, Performance, Programming techniques, Tesla C1060

January 18, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

Inside VOLT: Designing an Open-Source GPU Compiler (Tool)

SciDef: Automated Definition Extraction from Scientific Literature

SciDef: Automating Definition Extraction from Academic Literature with Large Language Models

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics

Benchmark suite for LLM inference on NVIDIA consumer GPUs

Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs

Theorizer: from the paper Generating Literature-Driven Scientific Discoveries at Scale

Generating Literature-Driven Scientific Theories at Scale

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: