30877

Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC

Yanbo Zhao, Zhaonan Meng, Sai Krishna Teja Varma Manthena, Xu Liu, Ajay Panyala, Jiajia Li
North Carolina State University, Department of Computer Science, Raleigh, USA
2026-06-11

@inproceedings{10.1145/3815001.3815003,

   author={Zhao, Yanbo and Meng, Zhaonan and Manthena, Sai Krishna Teja Varma and Liu, Xu and Panyala, Ajay and Li, Jiajia},

   title={Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC},

   year={2026},

   isbn={9798400727108},

   publisher={Association for Computing Machinery},

   address={New York, NY, USA},

   url={https://doi.org/10.1145/3815001.3815003},

   doi={10.1145/3815001.3815003},

   booktitle={Proceedings of the 12th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming},

   pages={17–29},

   numpages={13},

   keywords={Code Generation, Domain-Specific Languages (DSL), Heterogeneous Computing, High-Performance Computing (HPC), Performance Portability},

   location={Boulder, CO, USA},

   series={ARRAY ’26}

}

Download Download (PDF)   View View   Source Source   

396

views

High-Performance Computing (HPC) applications increasingly depend on GPUs, yet developing optimized kernels across evolving GPU architectures remains a major productivity bottleneck. With a tile-based programming model, Triton, a Python-based domain-specific language from the AI ecosystem, presents a compelling opportunity to simplify high-performance GPU kernel development for HPC. However, its tight coupling with Python creates significant integration barriers. In this paper, we investigate the feasibility of leveraging Triton for traditional HPC development. We present a compilation framework that transforms Triton kernels into standalone shared objects with C-compatible interfaces, eliminating Python dependencies and enabling seamless integration into HPC codebases while preserving optimization and portability benefits. We validate the approach by replacing kernels in representative HPC workloads with simpler Triton implementations that deploy across NVIDIA and AMD GPUs without modification. Triton achieves near-parity performance with native implementations on tile-friendly workloads, while irregular kernels reveal current limitations of its tile-based programming model. These results suggest that bridging the AI and HPC ecosystems via Triton offers a practical path toward more productive, portable, and sustainable GPU kernel development for HPC.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: