Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures
Virginia Polytechnic Institute
Virginia Polytechnic Institute, 2014
@thesis{lyerly2014automatic,
title={Automatic Scheduling of Compute Kernels Across Heterogeneous Architectures},
author={Lyerly, Robert F.},
year={2014}
}
The world of high-performance computing has shifted from increasing single-core performance to extracting performance from heterogeneous multi- and many-core processors due to the power, memory and instruction-level parallelism walls. All trends point towards increased processor heterogeneity as a means for increasing application performance, from smartphones to servers. These various architectures are designed for different types of applications – traditional "big" CPUs (like the Intel Xeon or AMD Opteron) are optimized for low latency while other architectures (such as the NVidia Tesla K20x or Intel Xeon Phi) are optimized for high-throughput. These architectures have different tradeoffs and different performance profiles, meaning fantastic performance gains for the right types of applications. However applications that are ill-suited for a given architecture may experience significant slowdown; therefore, it is imperative that applications are scheduled onto the correct processor. In order to perform this scheduling, applications must be analyzed to determine their execution characteristics (e.g. an application that contains a lot of branching may be better suited to a traditional CPU). Traditionally this application-to-hardware mapping was determined statically by the programmer. However, this requires intimate knowledge of the application and underlying architecture, and precludes load-balancing by the system. We demonstrate and empirically evaluate a system for automatically scheduling compute kernels by extracting program characteristics and applying machine learning techniques. We develop a machine learning process that is system-agnostic, and works for a variety of contexts (e.g. embedded, desktop/workstation, server). Finally, we perform scheduling in a workload-aware and workload-adaptive manner for these compute kernels.
May 9, 2014 by hgpu