An Autotuning Framework for Intel Xeon Phi Platforms
National Technical University of Athens
National Technical University of Athens, 2016
@article{christoforidis2016autotuning,
title={An Autotuning Framework For Intel Xeon Phi Platoforms},
author={Christoforidis, Lefteris},
year={2016}
}
In this thesis, we develop an auto-tuning framework for Intel Xeon Phi co-processors based on analytical methods. Its purpose is to relieve the application developer from configuring the compiler and the execution environment by efficiently and optimally finding the solution that delivers the best outcome in respect of performance and power. Shortly, the Autotuner has an offline database of performance data from a set of diverse applications executed on a set of configurations. These data were collected using LIKWID[62], a lightweight performance-oriented tool suite for x86 multicore environments. The framework uses these data to find correlations between the applications and the configurations that are being examined. To achieve this it uses a collaborative filtering technique[48, 35] that exploits the idea behind Singular Value Decomposition (SVD)[50]. Hence, applications and configurations are mapped to a feature space. That is a set of attributes, which consists of some configurations, and the scalar relation of the applications and the configurations to those attributes. Then each new application that arrives, is minimally profiled to a couple configurations and then it is projected to the constructed feature space, based on its ratings for the known configurations. Correlations with each feature are produced and consequently, its unknown ratings can be calculated. In the end, we have a fully populated vector with predicted ratings for all the configurations, from which we are able to choose the best predicted rating that corresponds to a specific configuration. In addition, the auto-tuning framework we developed substantiates the employment of machine learning techniques and the utilization of their capabilities in the scarce field of autotuners and contributes significantly to it. Besides the fast predictions and the good performance, singular value decomposition also reduces the space needed for the characterization of the applications against the configurations, thus storing huge info in small space.
July 30, 2016 by hgpu