Lingcheng Kong, Jiateng Wei, Hanzhang Shen, Huan Wang
Ping Guo, Chenyu Zhu, Siyuan Chen, Fei Liu, Xi Lin, Zhichao Lu, Qingfu Zhang
Yongin Kwon, Joohyoung Cha, Sehyeon Oh, Misun Yu, Jeman Park, Jemin Lee
Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum
Mohammad Firas Sada, John J. Graham, Elham E Khoda, Mahidhar Tatineni, Dmitry Mishin, Rajesh K. Gupta, Rick Wagner, Larry Smarr, Thomas A. DeFanti, Frank Würthwein
Jinliang Shi, Shigang Li, Youxuan Xu, Xueying Wang, Rongtian Fu, Zhi Ma, Tong Wu
Jiaqi Lv, Xufeng He, Yanchen Liu, Xu Dai, Yang Hu, Shouyi Yin
Tags: AI, Benchmarking, Compilers, Computer science, CUDA, Deep learning, LLM, nVidia, nVidia A100, Package, performance portability
Masahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase
Dimitar Mileski, Nikola Petrovski, Marjan Gusev
Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu