high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Better speedups using simpler parallel programming for graph connectivity and biconnectivity

James A. Edwards, Uzi Vishkin

University of Maryland, Maryland

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM ’12), 2012

DOI:10.1145/2141702.2141714

BibTeX

Download (PDF)

View

Source

1752

views

Speedups demonstrated for finding the biconnected components of a graph: 9x to 33x on the Explicit Multi-Threading (XMT) many-core computing platform relative to the best serial algorithm using a relatively modest silicon budget. Further evidence suggests that speedups of 21x to 48x are possible. For graph connectivity, we demonstrate that XMT outperforms two recent NVIDIA GPUs of similar or greater silicon area. Previous studies of parallel biconnectivity algorithms achieved at most a 4x speedup, but we could not find biconnectivity code for GPUs to compare biconnectivity against them. Ease-of-programming: The paper suggests that parallel programming for the XMT platform is considerably simpler than for the SMP and GPU ones. Unlike the quantitative speedup results, the ease-of-programming comparison is more qualitative. Productivity of parallel programming is a central interest of PMAM/PPoPP strongly favoring ease-of-programming. We believe that the discussion is on par with the state of the art on this relatively underexplored interest. The results provide new insights into the synergy between algorithms, the practice of parallel programming and architecture: (1) no single biconnectivity algorithm is dominant for all inputs; (2) XMT provides good performance for each algorithm and better speedups relative to other platforms; (3) the textbook (TV) PRAM algorithm was the only one that provided strong speedups on XMT across all inputs considered; and (4) the TV implementation was a direct implementation of a PRAM algorithm, though a nontrivial effort was needed to get a PRAM version with lower constant factors. Overall, it appears that previous low speedups on other platforms were not caused by inefficient algorithms or their programming. Instead, it is because of the better match between the algorithms and the XMT platform. Given the growing interest in adding architectural support for parallel programming to existing multi-cores, our results suggest the following open question: can such added architectural support catch up on speedups and ease-of-programming with a design originally inspired by parallel algorithms, such as XMT? Finally, this work addresses another related interest of PMAM/PPoPP: new parallel workloads that improve synergy with emerging architectures. One variant of biconnectivity algorithms demonstrated the potential advantage of enhancing XMT by supporting in hardware more thread contexts, perhaps through context switching between them–apparently, a first demonstration of this old Cray MTA concept benefiting XMT.

Tags: Algorithms, Computer science, CUDA, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 480, Performance, Programming techniques

March 11, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)