Data-Driven Dynamic Autotuning: Optimizing Autotuning Overhead with Prior Tuning Data
Masaryk University
Masaryk University, 2024
@article{olha2024data,
title={Data-Driven Dynamic Autotuning: Optimizing Autotuning Overhead with Prior Tuning Data},
author={OL’HA, JAROSLAV},
year={2024}
}
Modern high performance computing applications often rely on heterogeneous hardware resources to achieve maximum performance. This approach presents obvious benefits, combining the processing power of multiple different processors and allowing them to be more specialized. However, since HPC applications typically need to be programmed in a hardware-aware manner to achieve maximum performance, this places more burden on programmers to ensure that their programs can take full advantage of a wide variety of processing units. This issue can be addressed with source code autotuning – many implementation variants are defined in advance, and the most appropriate one is found once all execution conditions, such as the available hardware, become known. The problem of efficient computing is thus transformed into a searching problem. In some cases, the conditions that determine the efficiency of implementations only become known at runtime, or they keep changing during execution and require adaptation on the fly. In such a scenario, it is possible to overlap the autotuning process with the actual execution of the tuned code, which is commonly referred to as dynamic autotuning. The objectives of dynamic autotuning are shifted towards finding approximate solutions quickly, rather than searching for the global optimum. This thesis addresses several issues that arise in this context – the main focus of this work, in addition to introducing key concepts and presenting technical solutions, is to address the problem of autotuning overhead. Since the typical dynamic autotuning use case requires the tuning process to run concurrently with the tuned application, any autotuning overhead usually comes at the expense of the actual computation. Thus, reducing this overhead becomes even more critical than it is for traditional offline autotuning approaches. The author focuses on two main aspects of minimizing autotuning overhead: finding well-performing configurations more quickly, and setting a tuning budget that ensures minimal application run time. A well-chosen tuning budget ensures that autotuning neither wastes computational resources by running too long for minor code improvements, nor ends prematurely, leaving potential performance gains untapped. In both cases, historical data from previous autotuning efforts plays a major role, highlighting the importance of collecting and reusing this data. The experimental results presented in this thesis clearly show that various properties of tuning spaces, such as tuning parameter importance, relative portions of well-performing configurations, or the relationships between tuning parameters and hardware performance counters, are transferable across different hardware models. These findings led to the development of a profile-based searcher, which has shown considerable ability to improve autotuning convergence, and a tuning budget estimation method, which can ensure a near-optimal number of tuning iterations – both enhancing the effectiveness of dynamic autotuning methods and minimizing their negative impact on the tuned application.
November 3, 2024 by hgpu