high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Understanding the SIMD Efficiency of Graph Traversal on GPU

Understanding the SIMD Efficiency of Graph Traversal on GPU

Yichao Cheng, Hong An, Zhitao Chen, Feng Li, Zhaohui Wang, Xia Jiang, Yi Peng

University of Science and Technology of China, Hefei, China

The 14th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2014), 2014

@article{cheng2014understanding,

title={Understanding the SIMD Efficiency of Graph Traversal on GPU},

author={Cheng, Yichao and An, Hong and Chen, Zhitao and Li, Feng and Wang, Zhaohui and Jiang, Xia and Peng, Yi},

year={2014}

}

Download (PDF)

View

Source

3109

views

Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on GPU still faces formidable challenges, one of which is to map the irregular computation onto GPU’s vectorized execution model. In this paper, we investigate the link between graph topology and performance of BFS on GPU. We introduce a novel model to analyze the components of SIMD underutilization. We show that SIMD lanes are wasted either due to the workload imbalance between tasks, or to the heterogeneity of each task. We also develop corresponding metrics to quantify the SIMD efficiency for BFS on GPU. Finally, we demonstrate the applicability of the metrics by using them to profile the performance for different mapping strategies.

Tags: Algorithms, Computer science, CUDA, Graph theory, nVidia, Performance, Tesla C2050

July 10, 2014 by hgpu

Rating: 1.8/5. From 3 votes.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Understanding the SIMD Efficiency of Graph Traversal on GPU

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Understanding the SIMD Efficiency of Graph Traversal on GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)