hgpu.org » Dense linear algebra
Chetan Jhurani, Paul Mullowney
Tags: BLAS, CUBLAS, CUDA, Dense linear algebra, GEMM, Linear Algebra, nVidia, Parallel programming, Tesla K20
April 9, 2013 by chetan.jhurani
Recent source codes
* * *
Most viewed papers (last 30 days)
- A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading
- pyATF: Constraint-Based Auto-Tuning in Python
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
- WgPy: GPU-accelerated NumPy-like array library for web browsers
- CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
- LLMPerf: GPU Performance Modeling meets Large Language Models
- The Shamrock code: I- Smoothed Particle Hydrodynamics on GPUs
- Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems
- TransCL: An Automatic CUDA-to-OpenCL Programs Transformation Framework
- Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
* * *