high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Multi-Stage CUDA Kernel for Floyd-Warshall

A Multi-Stage CUDA Kernel for Floyd-Warshall

Ben Lund, Justin W Smith

University of Cincinnati, Department Of Computer Science, 814 Rhodes Hall, Cincinnati, OH 45221

arXiv:1001.4108 [cs.DC] (25 Feb 2010)

@article{2010arXiv1001.4108L,

author={Lund}, B. and {Smith}, J.~W},

title={“{A Multi-Stage CUDA Kernel for Floyd-Warshall}”},

journal={ArXiv e-prints},

archivePrefix={“arXiv”},

eprint={1001.4108},

primaryClass={“cs.DC”},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing, Computer Science – Performance, D.1.3},

year={2010},

month={jan},

adsurl={http://adsabs.harvard.edu/abs/2010arXiv1001.4108L},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

View

Source

1783

views

We present a new implementation of the Floyd-Warshall All-Pairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.

Tags: Computer science, CUDA, nVidia, Optimization, Performance, Programming techniques, Tesla C1060

January 18, 2011 by hgpu

No votes yet.

Please wait...

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

SimSYCL: A SYCL Implementation Targeting Development, Debugging, Simulation and Conformance

GPU plugin for PySCF

Python-Based Quantum Chemistry Calculations with GPU Acceleration

QArray

QArray: a GPU-accelerated constant capacitance model simulator for large quantum dot arrays

Celerity: High-level C++ for Accelerator Clusters

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

94% on CIFAR-10 in 3.29 Seconds on a Single GPU

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers

OpenMC Monte Carlo Code

Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

Polygeist: C/C++ frontend for MLIR

Retargeting and Respecializing GPU Workloads for Performance Portability

Parallel Gaussian process with kernel approximation in CUDA

Parallel Gaussian process with kernel approximation in CUDA

See all packages

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: