high performance computing on graphics processing units: hgpu.org

hgpu.org » LLM

Enhancing Transformer Performance and Portability through Auto-tuning Frameworks

Patricia Siwinska,Jie Lei,Adrian Castello,Pedro Alonso-Jord́a,Enrique S. Quintana-Orti

View

Download (PDF)

Source codes

Tags: Auto-Tuning, Computer science, CUDA, Deep learning, Heterogeneous systems, LLM, nVidia, nVidia A100, Package, performance portability

November 2, 2025 by hgpu

Collective Communication for 100k+ GPUs

Min Si, Pavan Balaji, Yongzhou Chen, Ching-Hsiang Chu, Adi Gangidi, Saif Hasan, Subodh Iyengar, Dan Johnson, Bingzhe Liu, Jingliang Ren, Ashmitha Jeevaraj Shetty, Greg Steinbrecher, Xinfeng Xie, Yulun Wang, Bruce Wu, Jingyi Yang, Mingran Yang, Minlan Yu, Cen Zhao, Wes Bland, Denis Boyda, Suman Gumudavelli, Cristian Lumezanu, Rui Miao, Zhe Qu, Venkat Ramesh, Maxim Samoylov, Jan Seidel, Feng Tian, Qiye Tan, Shuqiang Zhang, Yimeng Zhao, Shengbao Zheng, Art Zhu, Hongyi Zeng

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, GPU cluster, LLM, nVidia, nVidia H100, Package, Performance

October 26, 2025 by hgpu

STARK: Strategic Team of Agents for Refining Kernels

Juncheng Dong, Yang Yang, Tao Liu, Yang Wang, Feng Qi, Vahid Tarokh, Kaushik Rangadurai, Shuang Yang

View

Download (PDF)

Tags: Benchmarking, Code generation, Computer science, LLM, nVidia, nVidia A100

October 26, 2025 by hgpu

Tutoring LLM into a Better CUDA Optimizer

Matyáš Brabec, Jiří Klepl, Michal Töpfer, Martin Kruliš

View

Download (PDF)

Source codes

Tags: Code generation, Computer science, CUDA, LLM, nVidia, nVidia A100, nVidia H100, nVidia V100, Package

October 26, 2025 by hgpu

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang

View

Download (PDF)

Source codes

Tags: AI, Code generation, Computer science, CUDA, Deep learning, LLM, Machine learning, nVidia, Package

October 12, 2025 by hgpu

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

Ping Guo, Chenyu Zhu, Siyuan Chen, Fei Liu, Xi Lin, Zhichao Lu, Qingfu Zhang

View

Download (PDF)

Tags: AI, Code generation, Computer science, CUDA, Deep learning, LLM, nVidia, nVidia GeForce RTX 4090, nVidia H100, PyTorch

October 12, 2025 by hgpu

VibeCodeHPC: An Agent-Based Iterative Prompting Auto-Tuner for HPC Code Generation Using LLMs

Shun-ichiro Hayashi, Koki Morita, Daichi Mukunoki, Tetsuya Hoshino, Takahiro Katagiri

View

Download (PDF)

Source codes

Tags: AI, Code generation, Computer science, CUDA, HPC, LLM, nVidia, OpenACC, OpenMP, Package, Tesla V100

October 5, 2025 by hgpu

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs

Mohammad Zaeed, Tanzima Z. Islam, Vladimir Inđić

View

Download (PDF)

Tags: AI, AMD Radeon Instinct MI210, ATI, Computer science, CUDA, LLM, nVidia, nVidia H100, Performance, ROCm

October 5, 2025 by hgpu

Robust LLM Training Infrastructure at ByteDance

Borui Wan, Gaohong Liu, Zuquan Song, Jun Wang, Yun Zhang, Guangming Sheng, Shuguang Wang, Houmin Wei, Chenyuan Wang, Weiqiang Lou, Xi Yang, Mofan Zhang, Kaihua Jiang, Cheng Ren, Xiaoyun Zhi, Menghan Yu, Zhe Nan, Zhuolin Zheng, Baoquan Zhong, Qinlong Wang, Huan Yu, Jinxin Chi, Wang Zhang, Yuhan Li, Zixian Du, Sida Zhao, Yongqiang Zhang, Jingzhe Tang, Zherui Liu, Chuan Wu, Yanghua Peng, Haibin Lin, Wencong Xiao, Xin Liu, Liang Xiang

View

Download (PDF)

Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia L20

September 28, 2025 by hgpu