Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading
Northeastern University, China, 2 Karlsruhe Institute of Technology (KIT), Germany
5th USENIX Workshop on Hot Topics in Parallelism, 2013
@article{liu2013optimize,
title={Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading},
author={Liu, Wei and Chen, Jian-Jia and Deng, Qingxu and Kuo, Tei-Wei and Liu, Xue},
year={2013}
}
With the proliferation of general purpose computation, GPUs are becoming extremely important to significantly improve system performance for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when a large number of workloads (kernels) are running together. In this paper, we establish a mechanism to schedule a large number of workloads that have to be executed on GPUs to minimize the makespan of all workloads to improve the system overall performance. We make an abstraction for each workload and propose an effective way to estimate the transfer time and execution time. Under the assumption that the scheduling inside GPUs is first-come-first-serve (FCFS) with workloads conserving, we propose two effective algorithms based on the largest-processing-time (LPT) first and the well-known Johnson’s sequence in the two-stage flowshop scheduling problem. We implement some system calls in Linux to validate our proposed algorithms. By using Nvidia Geforce GT 630M as our experimental platform, we demonstrate that our model can effectively estimate the timing information of each workload and our algorithms can improve the performance, comparing to the current GPU driver implementation that schedules the data transferring arbitrarily.
July 5, 2013 by hgpu