Aspect-Driven Mixed-Precision Tuning Targeting GPUs

hgpu.org » Applications » Computer science » Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Ricardo Nobre, Luis Reis, Joao Bispo, Tiago Carvalho, Joao M.P. Cardoso, Stefano Cherubin, Giovanni Agosta

University of Porto, Portugal

9th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, 2018

BibTeX

Download (PDF)

View

Source

Source codes

Package:

clava: C/C++ Source-to-Source Tool based on Clang

1985

views

Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.

Tags: AMD Vega 10 XT, ATI, Code generation, Computer science, Mixed precision, nVidia, OpenCL, Package

June 13, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org