Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Ricardo Nobre, Luis Reis, Joao Bispo, Tiago Carvalho, Joao M.P. Cardoso, Stefano Cherubin, Giovanni Agosta
University of Porto, Portugal
9th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2018


   title={Aspect-Driven Mixed-Precision Tuning Targeting GPUs},

   author={Nobre, Ricardo and Reis, Lu{‘i}s and Bispo, Jo{~a}o and Carvalho, Tiago and Cardoso, Jo{~a}o MP and Cherubin, Stefano and Agosta, Giovanni},

   booktitle={Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms},





Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.
Rating: 2.0/5. From 1 vote.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: