hgpu.org » GEMM
Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, Dingwen Tao
Tags: Algorithms, Computer science, CUDA, GEMM, Linear Algebra, Matrix multiplication, nVidia, Tesla K40, Tesla M40, Tesla P100
February 16, 2020 by hgpu
Chetan Jhurani, Paul Mullowney
Tags: BLAS, CUBLAS, CUDA, Dense linear algebra, GEMM, Linear Algebra, nVidia, Parallel programming, Tesla K20
April 9, 2013 by chetan.jhurani
* * *
Recent source codes
* * *
Most viewed papers (last 30 days)
- Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning
- On the Compilation Performance of Current SYCL Implementations
- Portable, Scalable Approaches for Improving Asynchronous Many-Task Runtime Node Use
- Seamless GPU acceleration for C++ based physics with the Metal Shading Language on Apple's M series unified chips
- Onesweep: A Faster Least Significant Digit Radix Sort for GPUs
- Dropbear: Machine Learning Marketplaces made Trustworthy with Byzantine Model Agreement
- End-to-end Optimization of Machine Learning Prediction Queries
- SnuHPL: high performance LINPACK for heterogeneous GPUs
- Securing GPU via Region-based Bounds Checking
- FELARE: Fair Scheduling of Machine Learning Applications on Heterogeneous Edge Systems
* * *