Shaobo Ma, Chao Fang, Haikuo Shao, Zhongfeng Wang

Junqing Lin, Jingwei Sun, Xiaolong Shi, Honghe Zhang, Xianzhi Yu, Xinzhi Wang, Jun Yao, Guangzhong Sun

Tags: Compilers, Computer science, CUDA, Deep learning, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080 Ti, Performance, Sparse matrix, Tesla V100

Wei Sun, Ang Li, Sander Stuijk, Henk Corporaal

Afzal Ahmad, Linfeng Du, Wei Zhang

L.A. Torres, Carlos J. Barrios H, Yves Denneulin

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia A100, Package, Performance, SYCL

Endri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora

Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, ATI, Computer science, Hardware Architecture, HPC, Matrix multiplication, nVidia, nVidia A100, nVidia H100, nVidia V100, PTX

Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim

Tags: Computer science, CUDA, Deep learning, Machine learning, Matrix multiplication, Mixed precision, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia RTX A6000, Package

February 18, 2024 by

hgpuJiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko

Tags: Cloud, Computer science, CUDA, Matrix multiplication, nVidia, nVidia GeForce RTX 2070, nVidia GeForce RTX 2080 Ti, nVidia GeForce RTX 3090, Package, Performance, PyTorch, Tesla A100

Benjamin Brock, Aydın Buluç, Katherine Yelick

David Warde-Farley, Vinod Nair, Yujia Li, Ivan Lobov, Felix Gimeno, Simon Osindero

November 12, 2023 by

hgpu