hgpu.org » nVidia GeForce RTX 4060
Performant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision
Evelyne Ringoot, Rabab Alomairy, Valentin Churavy, Alan Edelman
Tags: AMD Radeon Instinct MI250, Apple M1 Pro, ATI, Computer science, HIP, Intel, Intel Ponte Vecchio Max 1100, Kokkos, Linear Algebra, Machine learning, nVidia, nVidia A100, nVidia GeForce RTX 4060, nVidia H100, OpenCL, SYCL
August 17, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- GPU-acceleration of the Discontinuous Galerkin Shallow Water Equations Solver (DG-SWEM) using CUDA and OpenACC
- CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation
- Harnessing Batched BLAS/LAPACK Kernels on GPUs for Parallel Solutions of Block Tridiagonal Systems
- An HPC Benchmark Survey and Taxonomy for Characterization
- Home-made Diffusion Model from Scratch to Hatch
- High Performance Matrix Multiplication
- Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
- Dato: A Task-Based Programming Model for Dataflow Accelerators
- TRUST: the HPC open-source CFD platform – from CPU to GPU
- Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem
* * *