hgpu.org » Dense linear algebra
Chetan Jhurani, Paul Mullowney
Tags: BLAS, CUBLAS, CUDA, Dense linear algebra, GEMM, Linear Algebra, nVidia, Parallel programming, Tesla K20
April 9, 2013 by chetan.jhurani
Recent source codes
* * *
Most viewed papers (last 30 days)
- A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading
- KernelBench: Can LLMs Write Efficient GPU Kernels?
- The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition
- Seamless acceleration of Fortran intrinsics via AMD AI engines
- pyATF: Constraint-Based Auto-Tuning in Python
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
- WgPy: GPU-accelerated NumPy-like array library for web browsers
- Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment
- Forecasting time series with constraints
- CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
* * *