hgpu.org » Dense linear algebra
Chetan Jhurani, Paul Mullowney
Tags: BLAS, CUBLAS, CUDA, Dense linear algebra, GEMM, Linear Algebra, nVidia, Parallel programming, Tesla K20
April 9, 2013 by chetan.jhurani
Recent source codes
* * *
Most viewed papers (last 30 days)
- COOK Access Control on an embedded Volta GPU
- Optimal Kernel Orchestration for Tensor Programs with Korch
- Stencil Computations on AMD and Nvidia Graphics Processors: Performance and Tuning Strategies
- Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services
- A methodology for comparing optimization algorithms for auto-tuning
- How much can we gain from Tensor Kernel Fusion on GPUs?
- PSCToolkit: solving sparse linear systems with a large number of GPUs
- Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
- How to Rent GPUs on a Budget
- CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization
* * *