15488

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Ashwin M. Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng
AMD Research, Advanced Micro Devices, Inc.
Proceedings of the IEEE Cluster, 2015

@InProceedings{aji-queue-sch-cluster15,

   author={Aji, Ashwin M. and Pena, Antonio J. and Balaji, Pavan and Feng, Wu-chun},

   title={Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL},

   booktitle={IEEE Cluster},

   address={Chicago, Illinois},

   month={September},

   year={2015}

}

Download Download (PDF)   View View   Source Source   

1280

views

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue", to a specific device for the entire program. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort that requires a thorough understanding of the match between the characteristics of all the underlying device architectures and the kernels in the program. In this paper, we propose to add scheduling attributes to the OpenCL context and command queue objects that can be leveraged by an intelligent runtime scheduler to automatically perform ideal queue-device mapping. Our proposed extensions enable the average OpenCL programmer to focus on the algorithm design rather than scheduling and automatically gain performance without sacrificing programmability. As an example, we design and implement an OpenCL runtime for task-parallel workloads, called MultiCL, which efficiently schedules command queues across devices. Within MultiCL, we implement several key optimizations to reduce runtime overhead. Our case studies include the SNU-NPB OpenCL benchmark suite and a real-world seismology simulation. We show that, on average, users have to apply our proposed scheduler extensions to only four source lines of code in existing OpenCL applications in order to automatically benefit from our runtime optimizations. We also show that MultiCL always maps command queues to the optimal device set with negligible runtime overhead.
No votes yet.
Please wait...

* * *

* * *

Featured events

2018
November
27-30
Hida Takayama, Japan

The Third International Workshop on GPU Computing and AI (GCA), 2018

2018
September
19-21
Nagoya University, Japan

The 5th International Conference on Power and Energy Systems Engineering (CPESE), 2018

2018
September
22-24
MediaCityUK, Salford Quays, Greater Manchester, England

The 10th International Conference on Information Management and Engineering (ICIME), 2018

2018
August
21-23
No. 1037, Luoyu Road, Hongshan District, Wuhan, China

The 4th International Conference on Control Science and Systems Engineering (ICCSSE), 2018

2018
October
29-31
Nanyang Executive Centre in Nanyang Technological University, Singapore

The 2018 International Conference on Cloud Computing and Internet of Things (CCIOT’18), 2018

HGPU group © 2010-2018 hgpu.org

All rights belong to the respective authors

Contact us: