high performance computing on graphics processing units: hgpu.org

hgpu.org » Linear Algebra

LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs

Junqing Lin, Jingwei Sun, Xiaolong Shi, Honghe Zhang, Xianzhi Yu, Xinzhi Wang, Jun Yao, Guangzhong Sun

View

Tags: Compilers, Computer science, CUDA, Deep learning, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080 Ti, Performance, Sparse matrix, Tesla V100

August 4, 2024 by hgpu

Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems

Jianhua Gao, Weixing Ji, Yizhuo Wang

View

Download (PDF)

Tags: Computer science, CUDA, Linear Algebra, nVidia, Optimization, Sparse matrix, Tesla V100

July 14, 2024 by hgpu

PSCToolkit: solving sparse linear systems with a large number of GPUs

Pasqua D'Ambra, Fabio Durastante, Salvatore Filippone

View

Download (PDF)

Source codes

Tags: CUDA, Linear Algebra, Mathematical Software, Mathematics, Numerical Analysis, nVidia, nVidia A100, Package, Sparse

July 7, 2024 by hgpu

Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Afzal Ahmad, Linfeng Du, Wei Zhang

View

Download (PDF)

Source codes

Tags: BLAS, Computer science, FPGA, GEMM, Linear Algebra, Machine learning, Matrix multiplication, OpenCL, Package

June 9, 2024 by hgpu

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

L.A. Torres, Carlos J. Barrios H, Yves Denneulin

View

Download (PDF)

Source codes

Tags: Computer science, CUBLAS, CUDA, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia A100, Package, Performance, SYCL

June 2, 2024 by hgpu

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Jianhua Gao, Bingjie Liu, Weixing Ji, Hua Huang

View

Download (PDF)

Tags: Computer science, FPGA, Heterogeneous systems, Linear Algebra, Machine learning, Overview, Sparse matrix

April 14, 2024 by hgpu

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Ryan Swann, Muhammad Osama, Karthik Sangaiah, Jalal Mahmud

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, ATI, Computer science, Linear Algebra, load balancing, Performance, Sparse matrix

April 7, 2024 by hgpu

Parallel Gaussian process with kernel approximation in CUDA

Davide Carminati

View

Download (PDF)

Source codes

Tags: Benchmarking, Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce GTX 1050, nVidia GeForce RTX 2080, Package

March 24, 2024 by hgpu

Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

Andres E. Tomas, Enrique S. Quintana-Orti, Hartwig Anzt

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Linear Algebra, nVidia, nVidia A100

March 18, 2024 by hgpu

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Mohammad Zubair, Christoph Bauinger

View

Download (PDF)

Tags: Computer science, CUDA, Deep learning, Intel, Intel Data Center GPU Max 1550, Linear Algebra, Machine learning, Mathematical Software, Matrix multiplication, nVidia, nVidia V100, Sparse matrix, SYCL

November 5, 2023 by hgpu

Bandicoot: C++ Library for GPU Linear Algebra and Scientific Computing

Ryan R. Curtin, Marcus Edel, Conrad Sanderson

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Linear Algebra, nVidia, nVidia GeForce RTX 2080 Ti, OpenCL, Package, Programming techniques

July 30, 2023 by hgpu

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Pablo Antonio Martínez Sánchez

View

Download (PDF)

Tags: Computer science, CUDA, Heterogeneous systems, Linear Algebra, Matrix multiplication, Neural networks, nVidia, nVidia GeForce RTX 2080, nVidia GeForce RTX 2080 Ti, Performance, performance portability, Thesis

July 16, 2023 by hgpu

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

LO-SpMM: Low-cost Search for High-performance SpMM Kernels on GPUs

Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems

PSCToolkit: solving sparse linear systems with a large number of GPUs

Fast and Practical Strassen’s Matrix Multiplication using FPGAs

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL

A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

Seer: Predictive Runtime Kernel Selection for Irregular Problems

Parallel Gaussian process with kernel approximation in CUDA

Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

Bandicoot: C++ Library for GPU Linear Algebra and Scientific Computing

Improving the Performance, Portability, and Productivity of Hardware Accelerators

Recent source codes

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Agentic Code Optimization via Compiler-LLM Cooperation

Most viewed papers (last 30 days)