high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Alecio Pedro Delazari Binotto

Instituto de Informatica, Universidade Federal do Rio Grande do Sul

Universidade Federal do Rio Grande do Sul, 2011

@article{binotto2011dynamic,

title={A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms},

author={Binotto, A.P.D.},

year={2011},

publisher={Universidade Federal do Rio Grande do Sul. Instituto de Inform{‘a}tica. Programa de P{‘o}s-Gradua{c{c}}{~a}o em Computa{c{c}}{~a}o.}

}

Download (PDF)

View

Source

2471

views

A modern personal computer can be now considered as a one-node heterogeneous cluster that simultaneously processes several applications’ tasks. It can be composed by asymmetric Processing Units (PUs), like the multi-core Central Processing Unit (CPU), the many-core Graphics Processing Units (GPUs) – which have become one of the main co-processors that contributed towards high performance computing – and other PUs. This way, a powerful heterogeneous execution platform is built on a desktop for data intensive calculations. In the perspective of this thesis, to improve the performance of applications and explore such heterogeneity, a workload distribution over the PUs plays a key role in such systems. This issue presents challenges since the execution cost of a task at a PU is non-deterministic and can be affected by a number of parameters not known a priori, like the problem size domain and the precision of the solution, among others. Within this scope, this doctoral research introduces a context-aware runtime and performance tuning system based on a compromise between reducing the execution time of the applications – due to appropriate dynamic scheduling of high-level tasks – and the cost of computing such scheduling applied on a platform composed of CPU and GPUs. This approach combines a model for a first scheduling based on an off-line task performance profile benchmark with a runtime model that keeps track of the tasks’ real execution time and efficiently schedules new instances of the high-level tasks dynamically over the CPU/GPU execution platform. For that, it is proposed a set of heuristics to schedule tasks over one CPU and one GPU and a generic and efficient scheduling strategy that considers several processing units. The proposed approach is applied in a case study using a CPU-GPU execution platform for computing iterative solvers for Systems of Linear Equations using a stencil code specially designed to explore the characteristics of modern GPUs. The solution uses the number of unknowns as the main parameter for assignment decision. By scheduling tasks to the CPU and to the GPU, it is achieved a performance gain of 21.77% in comparison to the static assignment of all tasks to the GPU (which is done by current programming models, such as OpenCL and CUDA for Nvidia) with a scheduling error of only 0.25% compared to exhaustive search.

Tags: Benchmarking, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia GeForce 8800 GT, nVidia GeForce GTX 285, OpenCL, Task scheduling, Thesis

December 5, 2011 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

A dynamic scheduling runtime and tuning system for heterogeneous multi and many-core desktop platforms

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)