Aspect-Driven Mixed-Precision Tuning Targeting GPUs
University of Porto, Portugal
9th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2018
@inproceedings{nobre2018aspect,
title={Aspect-Driven Mixed-Precision Tuning Targeting GPUs},
author={Nobre, Ricardo and Reis, Lu{‘i}s and Bispo, Jo{~a}o and Carvalho, Tiago and Cardoso, Jo{~a}o MP and Cherubin, Stefano and Agosta, Giovanni},
booktitle={Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms},
pages={26–31},
year={2018},
organization={ACM}
}
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.
June 13, 2018 by hgpu