high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Chao-Tung Yang, Chih-Lin Huang, Cheng-Fang Lin

Department of Computer Science, Tunghai University, Taichung City, 40704, Taiwan

Computer Physics Communications, Volume 182, Issue 1, January 2011, Pages 266-269 (16 July 2010)

DOI:10.1016/j.cpc.2010.06.035

@article{yang2010hybrid,

title={Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters},

author={Yang, C.T. and Huang, C.L. and Lin, C.F.},

journal={Computer Physics Communications},

issn={0010-4655},

year={2010},

publisher={Elsevier}

}

Download (PDF)

View

Source

4281

views

Nowadays, NVIDIA’s CUDA is a general purpose scalable parallel programming model for writing highly parallel applications. It provides several key abstractions – a hierarchy of thread blocks, shared memory, and barrier synchronization. This model has proven quite successful at programming multithreaded many core GPUs and scales transparently to hundreds of cores: scientists throughout industry and academia are already using CUDA to achieve dramatic speedups on production and research codes. In this paper, we propose a parallel programming approach using hybrid CUDA OpenMP, and MPI programming, which partition loop iterations according to the number of C1060 GPU nodes in a GPU cluster which consists of one C1060 and one S1070. Loop iterations assigned to one MPI process are processed in parallel by CUDA run by the processor cores in the same computational node.

Tags: Computer science, CUDA, GPU cluster, Hybrid computing, MPI, nVidia, OpenMP, Tesla C1060, Tesla S1070

November 8, 2010 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)