high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Accelerating CUDA Graph Algorithms at Maximum Warp

Accelerating CUDA Graph Algorithms at Maximum Warp

Sungpack Hong, Sang Kyun Kim, Tayo Oguntebi, Kunle Olukotun

Computer Systems Laboratory, Stanford University

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, 2011, p.267-276

@conference{hong2011accelerating,

title={Accelerating CUDA graph algorithms at maximum warp},

author={Hong, S. and Kim, S.K. and Oguntebi, T. and Olukotun, K.},

booktitle={Proceedings of the 16th ACM symposium on Principles and practice of parallel programming},

pages={267–276},

year={2011},

organization={ACM}

}

Download (PDF)

View

Source

Source codes

Package:

Accelerating CUDA graph algorithms at maximum warp

1550

views

Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method significantly improves the performance of applications with heavily imbalanced workloads, and enables trade-offs between workload imbalance and ALU underutilization for fine-tuning the performance. Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algorithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 260, nVidia GeForce GTX 275, Package, Performance, Programming techniques

February 27, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Accelerating CUDA Graph Algorithms at Maximum Warp

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Accelerating CUDA Graph Algorithms at Maximum Warp

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)