Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken
November 17, 2024 by
hgpuShaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang
Junqing Lin, Jingwei Sun, Xiaolong Shi, Honghe Zhang, Xianzhi Yu, Xinzhi Wang, Jun Yao, Guangzhong Sun
Tags: Compilers, Computer science, CUDA, Deep learning, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080 Ti, Performance, Sparse matrix, Tesla V100
Wei Sun, Ang Li, Sander Stuijk, Henk Corporaal
Afzal Ahmad, Linfeng Du, Wei Zhang
L.A. Torres, Carlos J. Barrios H, Yves Denneulin
Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia A100, Package, Performance, SYCL
Endri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora
Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan
Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, ATI, Computer science, Hardware Architecture, HPC, Matrix multiplication, nVidia, nVidia A100, nVidia H100, nVidia V100, PTX
Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim
Tags: Computer science, CUDA, Deep learning, Machine learning, Matrix multiplication, Mixed precision, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia RTX A6000, Package
February 18, 2024 by
hgpuJiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko
Tags: Cloud, Computer science, CUDA, Matrix multiplication, nVidia, nVidia GeForce RTX 2070, nVidia GeForce RTX 2080 Ti, nVidia GeForce RTX 3090, Package, Performance, PyTorch, Tesla A100
Benjamin Brock, Aydın Buluç, Katherine Yelick