high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems

High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems

Haoqiang Jin, Dennis Jespersen, Piyush Mehrotra, Rupak Biswas

NAS Division, NASA Ames Research Center, Moffett Field, CA 94035

Parallel Computing (02 March 2011)

DOI:10.1016/j.parco.2011.02.002

BibTeX

Download (PDF)

View

Source

1645

views

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems – distributed memory across nodes and shared memory with non-uniform memory access within each node – poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems – a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.

Tags: Computer science, MPI, OpenMP

March 28, 2011 by hgpu

No votes yet.

Please wait...

* * *

high performance computing on graphics processing units: hgpu.org

High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)

High Performance Computing Using MPI and OpenMP on Multi-core Parallel Systems

Share this:

Recent source codes

Most viewed papers (last 30 days)