30329

Enhancing Transformer Performance and Portability through Auto-tuning Frameworks

Patricia Siwinska,Jie Lei,Adrian Castello,Pedro Alonso-Jord́a,Enrique S. Quintana-Orti
Universitat Polit`ecnica de Val`encia, Spain
Research Square, 2025

@article{siwinska2025enhancing,

   title={Enhancing Transformer Performance and Portability through Auto-tuning Frameworks},

   author={Siwinska, Patricia and Lei, Jie and Castell{‘o}, Adri{‘a}n and Alonso-Jord{‘a}, Pedro and Quintana-Ort{‘i}, Enrique S},

   year={2025}

}

Transformer-based models such as BERT and GPT2 have become the foundation of many modern applications, yet their execution requires substantial computational and memory resources. To address these challenges, recent advances in compiler technology and hardware accelerators have introduced new opportunities for performance portability. In this work, we evaluate JAX and TVM as high-level frameworks that combine a NumPy-like programming model with Just-In-Time (JIT) or Ahead-of-Time (AOT) code optimization and compilation, enabling efficient execution across CPUs or GPUs, and, in the case of JAX, on TPUs as well. We present systematic implementations of the core Transformer encoder and decoder blocks in JAX and TVM and compare their automatically optimized code against NumPy and CuPy baselines. Our experimental study covers heterogeneous hardware platforms (AMD CPU, NVIDIA GPUs, and Google TPUs) and multiple arithmetic precisions (FP32, INT8, and INT32). Results show that JAX and TVM deliver significant performance improvements over standard libraries, while reducing the programming effort required to adapt to different hardware. These findings demonstrate the potential of JIT- and AOT-oriented frameworks to serve as a portable and efficient solution for deploying Transformer workloads in diverse computing environments.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: