Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

hgpu.org » Applications » Computer science » Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures

Robert F. Lyerly

Virginia Polytechnic Institute

Virginia Polytechnic Institute, 2014

BibTeX

Download (PDF)

View

Source

1698

views

The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types of applications – traditional "big" CPUs (like the Intel Xeon or AMD Opteron) are optimized for low latency while other architectures (such as the NVidia Tesla K20x or Intel Xeon Phi) are optimized for high-throughput. These architectures have different tradeoffs and different performance profiles, meaning fantastic performance gains for the right types of applications. However applications that are ill-suited for a given architecture may experience significant slowdown; therefore, it is imperative that applications are scheduled onto the correct processor. In order to perform this scheduling, applications must be analyzed to determine their execution characteristics (e.g. an application that contains a lot of branching may be better suited to a traditional CPU). Traditionally this application-to-hardware mapping was determined statically by the programmer. However, this requires intimate knowledge of the application and underlying architecture, and precludes load-balancing by the system. We demonstrate and empirically evaluate a system for automatically scheduling compute kernels by extracting program characteristics and applying machine learning techniques. We develop a machine learning process that is system-agnostic, and works for a variety of contexts (e.g. embedded, desktop/workstation, server). Finally, we perform scheduling in a workload-aware and workload-adaptive manner for these compute kernels.

Tags: Compilers, Computer science, Heterogeneous systems, Intel Xeon Phi, Machine learning, nVidia, OpenCL, OpenMP, Task scheduling, Tesla C2075

May 9, 2014 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org