hgpu.org » Dense linear algebra
Chetan Jhurani, Paul Mullowney
Tags: BLAS, CUBLAS, CUDA, Dense linear algebra, GEMM, Linear Algebra, nVidia, Parallel programming, Tesla K20
April 9, 2013 by chetan.jhurani
Recent source codes
* * *
Most viewed papers (last 30 days)
- Dissecting the NVIDIA Blackwell Architecture with Microbenchmarks
- Performance Portable Gradient Computations Using Source Transformation
- ConTraPh: Contrastive Learning for Parallelization and Performance Optimization
- Specx: a C++ task-based runtime system for heterogeneous distributed architectures
- Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks
- Understanding the Landscape of Ampere GPU Memory Errors
- Using Deep Reinforcement Learning for Automatic Code Optimization in the MLIR Compiler
- GBOTuner: Autotuning of OpenMP Parallel Codes with Bayesian Optimization and Code Representation Transfer Learning
- SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching
- Kevin: Multi-Turn RL for Generating CUDA Kernels
* * *