A Deep Learning Model for Loop Interchange

hgpu.org » Applications » Computer science » A Deep Learning Model for Loop Interchange

A Deep Learning Model for Loop Interchange

Lina Mezdour, Khadidja Kadem, Massinissa Merouani, Amina Selma Haichour, Saman Amarasinghe, Riyadh Baghdadi

NYU Abu Dhabi, ESI, Abu Dhabi, UAE

32nd ACM SIGPLAN International Conference on Compiler Construction (CC 2023), 2023

DOI:10.1145/3578360.3580257

BibTeX

Download (PDF)

View

Source

Source codes

Package:

A Deep Learning Model for Loop Interchange

977

views

Loop interchange is an important code optimization that improves data locality and extracts parallelism. While previous research in compilers has tried to automate the selection of which loops to interchange, existing methods have an important limitation. They use less precise machine models. This is mainly because developing a model to predict whether to interchange two loops is challenging since such a prediction depends on many factors. While state-of-the-art methods try to avoid this problem by using a deep-learning based cost model, they suffer from another limitation. They scale proportionally with the number of loop levels of a given loop nest. This is mainly because they use the model to evaluate all the possible loop interchanges (or a subset of the most promising ones). In this paper, we propose a novel deep-learning model for loop interchange that addresses the previous limitations. It takes a code representation as input and predicts the best pair of loops to interchange. Compared to state-of-the-art deep-learning based cost models, it requires constant time to predict the best loop interchange. This is in contrast to state-of-the-art deep learning models that are used to evaluate all the loop pairs and then pick the best one. The proposed model is the first deep learning model that requires a constant time to predict the best loops to interchange. The model is implemented and evaluated in the Tiramisu compiler, a state-of-the-art polyhedral compiler. We evaluate the proposed model on a benchmark of Tiramisu programs and show an accuracy of 78.57% for 1-shot and 85.71% for 2-shots. Experiments show that our model outperforms the cost model currently used by the Tiramisu compiler by 8.57% in terms of 1-shot accuracy, and 5.71% with 2-shots accuracy, while at the same time reducing the total execution time needed for predicting the best pair of loops to interchange.

Tags: Compilers, Computer science, Deep learning, nVidia, nVidia DGX-A100, Package

March 12, 2023 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org