high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Alecio Pedro Delazari Binotto

Instituto de Informatica, Universidade Federal do Rio Grande do Sul

Universidade Federal do Rio Grande do Sul, 2011

@article{binotto2011dynamic,

title={A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms},

author={Binotto, A.P.D.},

year={2011},

publisher={Universidade Federal do Rio Grande do Sul. Instituto de Inform{‘a}tica. Programa de P{‘o}s-Gradua{c{c}}{~a}o em Computa{c{c}}{~a}o.}

}

Download (PDF)

View

Source

1733

views

A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) – which have become one of the main co-processors that contributed towards high performance computing – and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications – due to appropriate dynamic scheduling of high-level tasks – and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 8800 GT, nVidia GeForce GTX 285, OpenCL, Task scheduling, Thesis

December 5, 2011 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Share this:

Recent source codes

Most viewed papers (last 30 days)