Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading

Wei Liu, Jian-Jia Chen, Qingxu Deng, Tei-Wei Kuo, Xue Liu
Northeastern University, China, 2 Karlsruhe Institute of Technology (KIT), Germany
5th USENIX Workshop on Hot Topics in Parallelism, 2013

   title={Optimize Overall System Performance Through Workload Sequencing for GPUs Data Offloading},

   author={Liu, Wei and Chen, Jian-Jia and Deng, Qingxu and Kuo, Tei-Wei and Liu, Xue},



Download Download (PDF)   View View   Source Source   



With the proliferation of general purpose computation, GPUs are becoming extremely important to significantly improve system performance for many computing systems, including embedded systems. Running massively parallel kernels on GPUs is challenging for system’s overall performance especially when a large number of workloads (kernels) are running together. In this paper, we establish a mechanism to schedule a large number of workloads that have to be executed on GPUs to minimize the makespan of all workloads to improve the system overall performance. We make an abstraction for each workload and propose an effective way to estimate the transfer time and execution time. Under the assumption that the scheduling inside GPUs is first-come-first-serve (FCFS) with workloads conserving, we propose two effective algorithms based on the largest-processing-time (LPT) first and the well-known Johnson’s sequence in the two-stage flowshop scheduling problem. We implement some system calls in Linux to validate our proposed algorithms. By using Nvidia Geforce GT 630M as our experimental platform, we demonstrate that our model can effectively estimate the timing information of each workload and our algorithms can improve the performance, comparing to the current GPU driver implementation that schedules the data transferring arbitrarily.
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)

* * *

* * *

Follow us on Twitter

HGPU group

1580 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

298 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: