high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient implementation of data flow graphs on multi-gpu clusters

Efficient implementation of data flow graphs on multi-gpu clusters

Vincent Boulos, Sylvain Huet, Vincent Fristot, Luc Salvo, Dominique Houzet

GIPSA-lab, Image-Signal Department, CNRS UMR 5216, University of Grenoble

Journal of Real-Time Image Processing, 2012

DOI:10.1007/s11554-012-0279-0

@article{boulos2012efficient,

title={Efficient implementation of data flow graphs on multi-gpu clusters},

author={Boulos, V. and Huet, S. and Fristot, V. and Salvo, L. and Houzet, D.},

journal={Journal of Real-Time Image Processing},

pages={1–16},

year={2012},

publisher={Springer}

}

Download (PDF)

View

Source

2202

views

Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization. In this paper, we propose a high level programming model based on a data flow graph (DFG) allowing an efficient implementation of digital signal processing applications on a multi-GPU computer cluster. This DFG-based design flow abstracts the underlying architecture. We focus particularly on the efficient implementation of communications by automating computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on three experiments: a multi-host multi-gpu benchmark, a 3D granulometry application developed for research on materials and an application for computing visual saliency maps.

Tags: Algorithms, CUDA, GPU cluster, Image processing, nVidia, nVidia GeForce GTX 285, nVidia GeForce GTX 460 M, Signal processing

November 7, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Efficient implementation of data flow graphs on multi-gpu clusters

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Efficient implementation of data flow graphs on multi-gpu clusters

Share this:

Recent source codes

Most viewed papers (last 30 days)