7878

Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures

Vignesh Trichy Ravi
Ohio State University
Ohio State University, 2012

@phdthesis{ravi2012runtime,

   title={Runtime Systems and Scheduling Support for High-End CPU-GPU Architectures},

   author={Ravi, V.T.},

   year={2012},

   school={The Ohio State University}

}

Download Download (PDF)   View View   Source Source   

935

views

In recent years, multi-core CPUs and many-core GPUs have emerged as mainstream and cost-effective means for scaling. Consequently, a trend that is receiving wide attention is of heterogeneous computing platforms consisting of both CPU and GPU. Such heterogeneous architectures are pervasive across notebooks, desktops, clusters, supercomputers and cloud environments. While they expose huge potential for computing, the state-of-the-art software support lacks much of the desired features to improve the performance and utilization of such systems. Particularly, we focus on three important problems: (i) While machines consisting of both multi-core CPU and GPU are available, there is no standard software support that enables application to harness aggregate compute power of both CPU and GPU, (ii) Although GPUs offer very high peak performance, often, its utilization is low, which is an important concern in heavily shared cloud environments. While resource sharing is a classic way to improve utilization, there is no software support to truly share the GPUs, and (iii) In shared supercomputers and cloud environments, a critical software component is a job scheduler, which aims at improving the resource utilization and maximizing the aggregate throughput. Thus, we formulate and revisit scheduling problems for CPU-GPU clusters. For the first problem, we have developed a runtime system that will enable an application to simultaneously benefit from the aggregate computing power of available CPU and GPU. Starting from a high-level API support, the runtime system transparently handles the concurrency control, and efficiently distributes the work automatically between CPU and GPU. This work has been extended and optimized to consider structured grid computation pattern. Our evaluation shows that significant performance benefits can be achieved while also improving the productivity of the user. For the second problem, we have developed a framework with runtime support for enabling one or more applications to transparently share one or more GPUs. We use consolidation as a mechanism to share a GPU and provide solutions to the conceptual problem of consolidation through affinity score and molding. Particularly, affinity score between two or more kernels provides an indication of potential performance improvement upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing in the case of kernels with conflicting resource requirements. As such, we demonstrate significant performance improvements from the use of our GPU sharing mechanisms. For the third problem, our scheduling formulations actively exploit the portability offered by programming models like OpenCL to automatically map jobs to CPU and GPU resources in the cluster. Based on this assumption, we have developed a number of scheduling schemes with two different goals. One is based on system-wide metrics like global throughput (make span) and latency, while the other is based on market-based metrics (known as value or yield) as defined or agreed between user and the service provider. Particularly, our scheduling schemes improve the utilization (thus, global throughput) by minimizing resource idle time, and also by efficiently handling the trade-off between queuing delay and non-optimal resource penalty. When the goal is to improve yield, we also factor various parameters of value functions (in addition to afore-mentioned trade-offs) into scheduling decisions. Our experimental results show that our schemes can significantly outperform the state-of-the-art solutions in practice.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: