Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems

Chih-Sheng Lin
Department of Computer Science and Information Engineering, National Chung Cheng University
National Chung Cheng University, 2014


   title={Application Synthesis and Optimization on Heterogeneous Parallel Processing Systems},

   author={Lin, Chih-Sheng},



Download Download (PDF)   View View   Source Source   



Recently, a hybrid system consisting of general-purpose processors (CPU) and accelerators such as graphic processing units (GPUs) have become mainstream system architecture design for achieving high performance and power efficiency. However, this growing trend is forcing programmers to address issues and challenges in adapting legacy serial programs into heterogeneous parallel programs. To alleviate the burden of the mentioned adaptation for programmers, we propose HEtergeneous Multi-Core (HEMC) framework for three issues, namely minimizing the execution time of a task on heterogeneous platforms, balancing the workload of a kernel across processors, and reduction of task schedule makespan on a heterogeneous platform. First, we propose a GPU configuration auto-tuning method that coordinates the simulated annealing (SA) algorithm with performance models for minimizing the GPU execution time of a task. The experiments show our proposed method outperform brute-force style algorithm without performance model 52 times reduction of the overhead of searching the optimal/near-optimal configuration. Second, we propose a mixed-integer nonlinear programming (MINLP) method for distributing/partitioning computational workload that balances the workload of a kernel on heterogeneous processors. The experimental results show that MINLP method outperforms uniform distribution and specification-based distribution 2.02 times and 108.27 times better than on performance and imbalance, respectively. Third, we propose a method that refines task schedule with workload distribution/partition method to fill the schedule hole forreducing the makespan. The experimental results show that our proposed method improves up to 11% than the state-of-the-art scheduling algorithm without workload partitioning. Finally, the HEMC framework with the proposed methods is evaluated. By using the HEMC framework, the performance of an application executing on a heterogeneous platform can be improved up to 48%.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: