Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

hgpu.org » Applications » Computer science » Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing

Tetsuya Odajima, Taisuke Boku, Mitsuhisa Sato, Toshihiro Hanawa, Yuetsu Kodama, Raymond Namyst, Samuel Thibault, Olivier Aumage

University of Tsukuba

The 2013 International Symposium on Advances of Distributed and Parallel Computing (ADPC 2013), hal-00920915, 2013

BibTeX

Download (PDF)

View

Source

Source codes

Package:

XcalableMP: a directive-based language for distributed memory system

5235

views

On the work sharing among GPUs and CPU cores on GPU equipped clusters, it is a critical issue to keep load balance among these heterogeneous computing resources. We have been developing a runtime system for this problem on PGAS language named XcalableMP-dev/StarPU [1]. Through the development, we found the necessity of adaptive load balancing for GPU/CPU work sharing to achieve the best performance for various application codes. In this paper, we enhance our language system XcalableMP-dev/StarPU to add a new feature which can control the task size to be assigned to these heterogeneous resources dynamically during application execution. As a result of performance evaluation on several benchmarks, we confirmed the proposed feature correctly works and the performance with heterogeneous work sharing provides up to about 40% higher performance than GPU-only utilization even for relatively small size of problems.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, Package, Performance, Task scheduling, Tesla M2090

December 25, 2013 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org