Encapsulated synchronization and load-balance in heterogeneous programming

hgpu.org » Applications » Computer science » Encapsulated synchronization and load-balance in heterogeneous programming

Encapsulated synchronization and load-balance in heterogeneous programming

Yuri Torres, Arturo Gonzalez-Escribano, Diego Llanos

Departamento de Informatica, Universidad de Valladolid

Euro-Par, 2012

BibTeX

Download (PDF)

View

Source

2111

views

Programming models and techniques to exploit parallelism in accelerators, such as GPUs, are different from those used in traditional parallel models for shared- or distributed-memory systems. It is a challenge to blend different programming models to coordinate and exploit devices with very different characteristics and computation powers. This paper presents a new extensible framework model to encapsulate runtime decisions related to data partition, granularity, load balance, synchronization, and communication for systems including assorted GPUs. Thus, the main parallel code becomes independent of them, using internal topology and system information to transparently adapt the computation to the system. The programmer can develop specific functions for each architecture, or use existent specialized library functions for different CPU-core or GPU architectures. The high-level coordination is expressed using a programming model built on top of message-passing, providing portability across distributed- or shared-memory systems. We show with an example how to produce a parallel code that can be used to efficiently run on systems ranging from a Beowulf cluster to a machine with mixed GPUs. Our experimental results show how the run-time system, guided by hints about the computational-power ratios of different devices, can automatically part and distribute large computations across heterogeneous systems, improving the overall performance.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 8500 GT, nVidia GeForce 9600 GT, Performance

June 9, 2012 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Encapsulated synchronization and load-balance in heterogeneous programming

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Encapsulated synchronization and load-balance in heterogeneous programming

Share this:

Recent source codes

Most viewed papers (last 30 days)