System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems
Georgia Institute of Technology
Georgia Institute of Technology, 2016
@article{sengupta2016system,
title={System Design Principles for Heterogeneous Resource Management and Scheduling in Accelerator-Based Systems},
author={Sengupta, Dipanjan},
year={2016},
publisher={Georgia Institute of Technology}
}
Accelerator-based systems are making rapid inroads into becoming platforms of choice for both high end cloud services and processing irregular applications like real-world graph analytics due to their high scalability and low dollar to FLOPS ratios. Yet GPUs are not first class schedulable entities causing substantial hardware resource underutilization, including their computational and data movement engines. Therefore, software solutions with support for efficient resource management principles are required to address such scheduling crisis in GPUs. Further, two important characteristics of real world graphs like those in social networks are that they are big and are constantly evolving over time. This poses challenge due to limitations in GPU-resident memory for storing these large graphs. And because of the high rate at which these large-scale graphs evolve, it is undesirable and computationally infeasible to repeatedly run static graph analytics on a sequence of versions or snapshots, of the evolving graph. Therefore, novel incremental solutions are required to process large-scale evolving graphs in near real-time using GPUs with memory footprint exceeding the device’s internal memory capacity. First, the thesis presents Strings, a GPU scheduling infrastructure that achieves high system throughput and fairness among applications from multiple tenants using manycore GPU servers by treating GPUs as first class schedulable entities, and decomposing the scheduling problem into a novel combination of load balancing and per-device resource sharing. Second, for processing graph applications with larger memory footprint than the device memory the thesis presents GraphReduce, a highly efficient and scalable GPU-based framework that adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs supporting efficient graph data movement between the host and device. Finally, to address the problem of analyzing evolving graphs in near real-time, we present EvoGraph, a high performance GPU-based dynamic graph analytics framework that incrementally processes graphs on-the-fly using fixed-sized batches of updates. To realize this vision we present a novel programming model that allows for implementing a large set of incremental graph algorithms seamlessly across multiple GPU cores. It also characterizes various graph algorithms and how related graph properties affect the complexity of incremental graph processing in making runtime decisions to choose between an incremental vs static run over a particular update batch to achieve the best performance.
September 10, 2016 by hgpu