14133

Generating Efficient Tensor Contractions for GPUs

Thomas Nelson, Axel Rivera, Prasanna Balaprakash, Mary Hall, Paul D. Hovland, Elizabeth Jessup, Boyana Norris
Department of Computer Science, University of Colorado, Boulder, CO 80309
Argonne National Laboratory Technical report ANL/MCS-P5361-0615, 2015
BibTeX

Download Download (PDF)   View View   Source Source   

1738

views

Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and autotuning of a large parameter space. Generated code shows significant performance gains over sequential and OpenMP parallel code, and a comparison with OpenACC shows the importance of autotuning and other optimizations in our framework for achieving efficient results.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org