Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC

hgpu.org » Applications » Computer science » Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC

Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC

Yanbo Zhao, Zhaonan Meng, Sai Krishna Teja Varma Manthena, Xu Liu, Ajay Panyala, Jiajia Li

North Carolina State University, Department of Computer Science, Raleigh, USA

2026-06-11

@inproceedings{10.1145/3815001.3815003,

author={Zhao, Yanbo and Meng, Zhaonan and Manthena, Sai Krishna Teja Varma and Liu, Xu and Panyala, Ajay and Li, Jiajia},

title={Leveraging AI Ecosystem for Portable and Sustainable GPU Kernels in HPC},

year={2026},

isbn={9798400727108},

publisher={Association for Computing Machinery},

address={New York, NY, USA},

url={https://doi.org/10.1145/3815001.3815003},

doi={10.1145/3815001.3815003},

booktitle={Proceedings of the 12th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming},

pages={17–29},

numpages={13},

keywords={Code Generation, Domain-Specific Languages (DSL), Heterogeneous Computing, High-Performance Computing (HPC), Performance Portability},

location={Boulder, CO, USA},

series={ARRAY ’26}

}

Download (PDF)

View

Source

933

views

High-Performance Computing (HPC) applications increasingly depend on GPUs, yet developing optimized kernels across evolving GPU architectures remains a major productivity bottleneck. With a tile-based programming model, Triton, a Python-based domain-specific language from the AI ecosystem, presents a compelling opportunity to simplify high-performance GPU kernel development for HPC. However, its tight coupling with Python creates significant integration barriers. In this paper, we investigate the feasibility of leveraging Triton for traditional HPC development. We present a compilation framework that transforms Triton kernels into standalone shared objects with C-compatible interfaces, eliminating Python dependencies and enabling seamless integration into HPC codebases while preserving optimization and portability benefits. We validate the approach by replacing kernels in representative HPC workloads with simpler Triton implementations that deploy across NVIDIA and AMD GPUs without modification. Triton achieves near-parity performance with native implementations on tile-friendly workloads, while irregular kernels reveal current limitations of its tile-based programming model. These results suggest that bridging the AI and HPC ecosystems via Triton offers a practical path toward more productive, portable, and sustainable GPU kernel development for HPC.

Tags: AMD, AMD Radeon Instinct MI300X, Computer science, DSL, HPC, nVidia, nVidia GeForce RTX 4090, nVidia H100, PTX, Python, ROCm, Triton

June 17, 2026 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org