Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

hgpu.org » Applications » Computer science » Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

Wei Liu, Jian-Jia Chen, Qingxu Deng, Tei-Wei Kuo, Xue Liu

Northeastern University, China, 2 Karlsruhe Institute of Technology (KIT), Germany

5th USENIX Workshop on Hot Topics in Parallelism, 2013

BibTeX

Download (PDF)

View

Source

2274

views

With the proliferation of general purpose computation, GPUs are becoming extremely important to significantly improve system performance for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when a large number of workloads (kernels) are running together. In this paper, we establish a mechanism to schedule a large number of workloads that have to be executed on GPUs to minimize the makespan of all workloads to improve the system overall performance. We make an abstraction for each workload and propose an effective way to estimate the transfer time and execution time. Under the assumption that the scheduling inside GPUs is first-come-first-serve (FCFS) with workloads conserving, we propose two effective algorithms based on the largest-processing-time (LPT) first and the well-known Johnson’s sequence in the two-stage flowshop scheduling problem. We implement some system calls in Linux to validate our proposed algorithms. By using Nvidia Geforce GT 630M as our experimental platform, we demonstrate that our model can effectively estimate the timing information of each workload and our algorithms can improve the performance, comparing to the current GPU driver implementation that schedules the data transferring arbitrarily.

Tags: Computer science, CUDA, nVidia, nVidia GeForce GT 630 M, Performance

July 5, 2013 by hgpu

No votes yet.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org