14244

Autotuning OpenACC Work Distribution via Direct Search

Calvin Montgomery, Jeffrey L. Overbey, Xuechao Li
Department of Computer Science and Software Engineering, Auburn University, AL, USA
ACM Conference on the Extreme Science and Engineering Discovery Environment (XSEDE15), 2015

@article{montgomery2015autotuning,

   title={Autotuning OpenACC Work Distribution via Direct Search},

   author={Montgomery, Calvin and Overbey, Jeffrey L and Li, Xuechao},

   year={2015}

}

Download Download (PDF)   View View   Source Source   Source codes Source codes

Package:

1455

views

OpenACC provides a high-productivity API for programming GPUs and similar accelerator devices. One of the last steps in tuning OpenACC programs is selecting values for the num_gangs and vector length clauses, which control how a parallel workload is distributed to an accelerator’s processing units. In this paper, we present OptACC, an autotuner that can assist the programmer in selecting high-quality values for these parameters, and we evaluate the effectiveness of two direct search methods in finding solutions. We assess the quality of the the num_gangs and vector_length values found by our autotuner by comparing them to the values found by a bounded exhaustive search; we also compare the kernel execution times to those of the untuned kernel. On a suite of 36 OpenACC kernels, one or both of our autotuner’s direct search methods identified values within the top 5% for 29 of the kernels, within the top 10% for five kernels, and within the top 25% for the remaining two. Eleven of the kernels achieved a speedup greater than 2x over the compiler’s defaults, and the autotuner required only 7-11 runs of the target program, on average.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: