high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Tarun Beri, Sorav Bansal, Subodh Kumar

Indian Institute of Technology, Delhi

Indian Institute of Technology, 2014

@article{beri2014scheduling,

title={A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators},

author={Beri, Tarun and Bansal, Sorav and Kumar, Subodh},

year={2014}

}

Download (PDF)

View

Source

1493

views

We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive workloads. We report prototype applications using our system. For example, sequential version of matrix multiplication or 2D FFT requires about 30 additional lines of code to parallelize on a cluster. Distributing multiplication of two square matrices, with 1 billion elements each, across a small cluster with 120 CPU cores and 20 GPUs, our runtime scheduler achieves more than 140x speedup over the single core CPU implementation; the single GPU implementation runs out of memory for this experiment. This performance is possible due to a number of challenging optimizations working in concert. These include prefetching, pipelining, maximizing overlap between computation and communication, and scheduling across devices of vastly different capacities.

Tags: Computer science, CUDA, FFT, GPU cluster, Heterogeneous systems, Matrix multiplication, Memory model, nVidia, Prefetch, Task scheduling, Tesla M2070

February 11, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Recent source codes

QArray

Celerity: High-level C++ for Accelerator Clusters

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Optical flow algorithms for SYCL

OpenMP5-Offload-OpenMC-Intel-PVC

Most viewed papers (last 30 days)

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Share this:

Recent source codes

Most viewed papers (last 30 days)