high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » A Unified Approach to Variable Renaming for Enhanced Vectorization

A Unified Approach to Variable Renaming for Enhanced Vectorization

Prasanth Chatarasi, Jun Shirako, Albert Cohen, Vivek Sarkar

Georgia Institute of Technology, Atlanta GA, USA

31st International Workshop on Languages and Compilers for Parallel Computing (LCPC’18), 2018

BibTeX

Download (PDF)

View

Source

1829

views

Despite the fact that compiler technologies for automatic vectorization have been under development for over four decades, there are still considerable gaps in the capabilities of modern compilers to perform automatic vectorization for SIMD units. One such gap can be found in the handling of loops with dependence cycles that involve memory-based anti (write-after-read) and output (write-after-write) dependences. Past approaches, such as variable renaming and variable expansion, break such dependence cycles by either eliminating or repositioning the problematic memory-based dependences. However, the past work suffers from three key limitations: 1) Lack of a unified framework that synergistically integrates multiple storage transformations, 2) Lack of support for bounding the additional space required to break memory-based dependences, and 3) Lack of support for integrating these storage transformations with other code transformations (e.g., statement reordering) to enable vectorization. In this paper, we address the three limitations above by integrating both Source Variable Renaming (SoVR) and Sink Variable Renaming (SiVR) transformations into a unified formulation, and by formalizing the "cycle-breaking" problem as a minimum weighted set cover optimization problem. To the best of our knowledge, our work is the first to formalize an optimal solution for cycle breaking that simultaneously considers both SoVR and SiVR transformations, thereby enhancing vectorization and reducing storage expansion relative to performing the transformations independently. We implemented our approach in PPCG, a state-of-the-art optimization framework for loop transformations, and evaluated it on eleven kernels from the TSVC benchmark suite. Our experimental results show a geometric mean performance improvement of 4.61x on an Intel Xeon Phi (KNL) machine relative to the optimized performance obtained by Intel’s ICC v17.0 product compiler. Further, our results demonstrate a geometric mean performance improvement of 1.08x and 1.14x on the Intel Xeon Phi (KNL) and Nvidia Tesla V100 (Volta) platforms relative to past work that only performs the SiVR transformation [5], and of 1.57x and 1.22x on both platforms relative to past work on using both SiVR and SoVR transformations.

Tags: Benchmarking, Compilers, Computer science, CUDA, Intel Xeon Phi, nVidia, Tesla V100

May 15, 2019 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

A Unified Approach to Variable Renaming for Enhanced Vectorization

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

A Unified Approach to Variable Renaming for Enhanced Vectorization

Share this:

Recent source codes

Most viewed papers (last 30 days)