high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » A block-asynchronous relaxation method for graphics processing units

A block-asynchronous relaxation method for graphics processing units

Hartwig Anzt, Stanimire Tomov, Jack Dongarra, Vincent Heuveline

Karlsruhe Institute of Technology, Germany

University of Tennessee, Innovative Computing Laboratory, Technical report, UT-CS-11-687, 2011

@techreport{anzt2011block,

title={A block-asynchronous relaxation method for graphics processing units},

author={Anzt, H. and Tomov, S. and Dongarra, J. and Heuveline, V.},

year={2011},

institution={Technical report, Innovative Computing Laboratory, University of Tennessee, UT-CS-11-687}

}

Download (PDF)

View

Source

3994

views

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time then GaussSeidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes – using multiple iterations within the "subdomain" handled by a GPU thread block and Jacobi-like asynchronous updates across the "boundaries", subject to tuning various parameters – we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the effective but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in todays architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, Tesla C2050

December 22, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A block-asynchronous relaxation method for graphics processing units

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

A block-asynchronous relaxation method for graphics processing units

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)