21741

Autotuning for Automatic Parallelization on Heterogeneous Systems

Philip Pfaffe
KIT-Fakultät für Informatik des Karlsruher Instituts für Technologie (KIT)
KIT-Bibiliotek, 2020

@article{pfaffe2019autotuning,

   title={Autotuning for Automatic Parallelization on Heterogeneous Systems},

   author={Pfaffe, Philip},

   year={2019}

}

Download Download (PDF)   View View   Source Source   

463

views

To meet the surging demand for high-speed computation in an era of stagnating increase in performance per processor, systems designers resort to aggregating many and even heterogeneous processors into single systems. Automatic parallelization tools relieve application developers of the tedious and error prone task of programming these heterogeneous systems. For these tools, there are two aspects to maximizing performance: Optimizing the execution on each parallel platform individually, and executing work on the available platforms cooperatively. To date, various approaches exist targeting either aspect. Automatic parallelization for simultaneous cooperative computation with optimized per-platform execution however remains an unsolved problem. This thesis presents the APHES framework to close that gap. The framework combines automatic parallelization with a novel technique for input-sensitive online autotuning. Its first component, a parallelizing polyhedral compiler, transforms implicitly data-parallel program parts for multiple platforms. Targeted platforms then automatically cooperate to process the work. During compilation, the code is instrumented to interact with libtuning, our new autotuner and second component of the framework. Tuning the work distribution and per-platform execution maximizes overall performance. The autotuner enables always-on autotuning through a novel hybrid tuning method, combining a new efficient search technique and model-based prediction. Experiments show that the APHES framework can solve the cooperative heterogeneous parallelization problem and that cooperative execution outperforms versions parallelized for a single platform. On benchmarks from the PolyBench benchmark suite, the APHES-transformed programs achieve a speedup of up to 6x compared to program versions generated by state-of-the-art single-platform parallelizers. The libtuning autotuner reduces the search time by up to 30% compared to stateof- the-art autotuning while still finding competitive configurations. Additionally, model-based prediction is is able to reduce 99% of the search overhead.
Rating: 4.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2020 hgpu.org

All rights belong to the respective authors

Contact us: