high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Tarun Beri, Sorav Bansal, Subodh Kumar

Indian Institute of Technology, Delhi

Indian Institute of Technology, 2014

@article{beri2014scheduling,

title={A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators},

author={Beri, Tarun and Bansal, Sorav and Kumar, Subodh},

year={2014}

}

Download (PDF)

View

Source

2299

views

We present a system that enables simple and intuitive programming of CPU+GPU clusters. This system relieves the programmer of the burden of load balancing, detailed data communication, task mapping, scheduling, etc. Our programming model is based on bulk synchronous distributed shared memory model, which is suitable for heterogenous multi-GPU clusters, especially so for compute intensive workloads. We report prototype applications using our system. For example, sequential version of matrix multiplication or 2D FFT requires about 30 additional lines of code to parallelize on a cluster. Distributing multiplication of two square matrices, with 1 billion elements each, across a small cluster with 120 CPU cores and 20 GPUs, our runtime scheduler achieves more than 140x speedup over the single core CPU implementation; the single GPU implementation runs out of memory for this experiment. This performance is possible due to a number of challenging optimizations working in concert. These include prefetching, pipelining, maximizing overlap between computation and communication, and scheduling across devices of vastly different capacities.

Tags: Computer science, CUDA, FFT, GPU cluster, Heterogeneous systems, Matrix multiplication, Memory model, nVidia, Prefetch, Task scheduling, Tesla M2070

February 11, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

A scheduling and runtime framework for a cluster of heterogeneous machines with multiple accelerators

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)