high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Task-based Conjugate-Gradient for multi-GPUs platforms

Task-based Conjugate-Gradient for multi-GPUs platforms

Emmanuel Agullo, Luc Giraud, Abdou Guermouche, Stojce Nakov, Jean Roman

Universite Sciences et Technologies – Bordeaux I

hal-00767368, (19 December 2012)

@article{agullo2012task,

title={Task-based Conjugate-Gradient for multi-GPUs platforms},

author={Agullo, Emmanuel, Giraud, Luc and Guermouche, Abdou and Nakov, Stojce and Roman, Jean},

year={2012}

}

Download (PDF)

View

Source

1472

views

Whereas most today parallel High Performance Computing (HPC) software is written as highly tuned code taking care of low-level details, the advent of the manycore area forces the community to consider modular programming paradigms and delegate part of the work to a third party software. That latter approach has been shown to be very productive and efficient with regular algorithms, such as dense linear algebra solvers. In this paper we show that such a model can be efficiently applied to a much more irregular and less compute intensive algorithm. We illustrate our discussion with the standard unpreconditioned Conjugate Gradient (CG) that we carefully express as a task-based algorithm. We use the StarPU runtime system to assess the efficiency of the approach on a computational platform consisting of three NVIDIA Fermi GPUs. We show that almost optimum speed up (up to 2.89) may be reached (relatively to a mono-GPU execution) when processing large matrices and that the performance is portable when changing the low-level memory transfer mechanism.

Tags: Algorithms, Computer science, Conjugate gradient solver, CUBLAS, CUDA, Linear Algebra, nVidia, Tesla C2070

December 25, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Task-based Conjugate-Gradient for multi-GPUs platforms

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Task-based Conjugate-Gradient for multi-GPUs platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)