high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Scaling CUDA for Distributed Heterogeneous Processors

Scaling CUDA for Distributed Heterogeneous Processors

Siu Kwan Lam

San Jose State University

San Jose State University, 2012

@article{lam2012scaling,

title={Scaling CUDA for Distributed Heterogeneous Processors},

author={Lam, S.K.},

year={2012}

}

Download (PDF)

View

Source

Source codes

Package:

Phalanx

2766

views

The mainstream acceptance of heterogeneous computing and cloud computing is prompting a future of distributed heterogeneous systems. With current software development tools, programming such complex systems is difficult and requires an extensive knowledge of network and processor architectures. Providing an abstraction of the underlying network, message-passing interface (MPI) has been the standard tool for developing distributed applications in the high performance community. The problem of MPI lies with its message-passing model, which is less expressive than the shared-memory model. Development of heterogeneous programming tools, such as OpenCL, has only begun recently. This thesis presents Phalanx, a framework that extends the virtual architecture of CUDA for distributed heterogeneous systems. Using MPI, Phalanx transparently handles intercommunication among distributed nodes. By using the shared-memory model, Phalanx simplifies the development of distributed applications without sacrificing the advantages of MPI. In one of the case studies, Phalanx achieves 28x speedup compared with serial execution on a Core-i7 processor.

Tags: Computer science, CUDA, Heterogeneous systems, Memory model, MPI, nVidia, Package, PTX, Python, Thesis

July 20, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Scaling CUDA for Distributed Heterogeneous Processors

Package:

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Scaling CUDA for Distributed Heterogeneous Processors

Package:

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)