high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scaling CUDA for Distributed Heterogeneous Processors

Scaling CUDA for Distributed Heterogeneous Processors

Siu Kwan Lam

San Jose State University

San Jose State University, 2012

@article{lam2012scaling,

title={Scaling CUDA for Distributed Heterogeneous Processors},

author={Lam, S.K.},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

Phalanx

2018

views

The mainstream acceptance of heterogeneous computing and cloud computing is prompting a future of distributed heterogeneous systems. With current software development tools, programming such complex systems is difficult and requires an extensive knowledge of network and processor architectures. Providing an abstraction of the underlying network, message-passing interface (MPI) has been the standard tool for developing distributed applications in the high performance community. The problem of MPI lies with its message-passing model, which is less expressive than the shared-memory model. Development of heterogeneous programming tools, such as OpenCL, has only begun recently. This thesis presents Phalanx, a framework that extends the virtual architecture of CUDA for distributed heterogeneous systems. Using MPI, Phalanx transparently handles intercommunication among distributed nodes. By using the shared-memory model, Phalanx simplifies the development of distributed applications without sacrificing the advantages of MPI. In one of the case studies, Phalanx achieves 28x speedup compared with serial execution on a Core-i7 processor.

Tags: Computer science, CUDA, Heterogeneous systems, Memory model, MPI, nVidia, Package, PTX, Python, Thesis

July 20, 2012 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Scaling CUDA for Distributed Heterogeneous Processors

Package:

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Scaling CUDA for Distributed Heterogeneous Processors

Package:

Share this:

Recent source codes

Most viewed papers (last 30 days)