high performance computing on graphics processing units: hgpu.org

hgpu.org » Mixed precision

Mixed-precision numerics in scientific applications: survey and perspectives

Aditya Kashi, Hao Lu, Wesley Brewer, David Rogers, Michael Matheson, Mallikarjun Shankar, Feiyi Wang

View

Download (PDF)

Tags: AI, Computer science, Mixed precision, nVidia, nVidia V100, Review

March 26, 2026 by hgpu

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

Matteo Croci, Garth N. Wells

View

Download (PDF)

Source codes

Tags: AVX, Computer science, Finite element method, Floating point error, Intel, Matrix multiplication, Mixed precision, Package

October 27, 2024 by hgpu

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Taesu Kim, Jongho Lee, Daehyun Ahn, Sarang Kim, Jiwoong Choi, Minkyu Kim, Hyungjun Kim

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Machine learning, Matrix multiplication, Mixed precision, nVidia, nVidia A100, nVidia GeForce RTX 4090, nVidia RTX A6000, Package

February 18, 2024 by hgpu

Memory Efficient Mixed-Precision Optimizers

Basile Lewandowski, Atli Kosson

View

Download (PDF)

Tags: Computer science, CUDA, Machine learning, Mixed precision, nVidia, nVidia A100, nVidia V100

October 1, 2023 by hgpu

Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels

Gabriel Freytag

View

Download (PDF)

Tags: Computer science, CUDA, Machine learning, Mixed precision, Neural networks, nVidia, nVidia A100, nVidia P100, Thesis

June 18, 2023 by hgpu

Efficient Quantized Sparse Matrix Operations on Tensor Cores

Shigang Li, Kazuki Osawa, Torsten Hoefler

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, Deep learning, Linear Algebra, Mixed precision, nVidia, nVidia A100, Package, Sparse matrix, Tesla V100

October 2, 2022 by hgpu

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Moritz Lehmann, Mathias J. Krause, Giorgio Amati, Marcello Sega, Jens Harting, Stephan Gekle

View

Download (PDF)

Tags: AMD Radeon Instinct MI100, AMD Radeon VI, ATI, Fluid dynamics, lattice Boltzmann, Mixed precision, nVidia, OpenCL, Tesla K20, Tesla K40, Tesla K80, Tesla P100, Tesla V100

December 19, 2021 by hgpu

Mixed precision in Graphics Processing Unit

Quentin Gallouédec

View

Download (PDF)

Tags: Computer science, Machine learning, Mixed precision, Neural networks, nVidia, Tesla V100

October 31, 2021 by hgpu

A Study of Mixed Precision Strategies for GMRES on GPUs

Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Linear Algebra, Mixed precision, nVidia, performance portability, Tesla V100

September 19, 2021 by hgpu

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

Binrui Li, Shenggan Cheng, James Lin

View

Download (PDF)

Tags: Computer science, FFT, Mixed precision, nVidia, nVidia DGX-2, nVidia DGX-A100, Tesla V100

May 2, 2021 by hgpu

Mixed-Precision Embedding Using a Cache

Jie (Amy)Yang, Jianyu Huang, Jongsoo Park, Ping Tak Peter Tang, Andrew Tulloch

View

Download (PDF)

Tags: Artificial intelligence, Computer science, CUDA, Machine learning, Mixed precision, nVidia, Tesla V100

October 25, 2020 by hgpu

Flexible Performant GEMM Kernels on GPUs

Thomas Faingnaert, Tim Besard, Bjorn De Sutter

View

Download (PDF)

Source codes

Tags: Computer science, CUBLAS, CUDA, Julia, Machine learning, Mathematical Software, Matrix multiplication, Mixed precision, nVidia, nVidia GeForce RTX 2080 Ti, Package, Performance

October 4, 2020 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

Mixed-precision numerics in scientific applications: survey and perspectives

Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Memory Efficient Mixed-Precision Optimizers

Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels

Efficient Quantized Sparse Matrix Operations on Tensor Cores

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Mixed precision in Graphics Processing Unit

A Study of Mixed Precision Strategies for GMRES on GPUs

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

Mixed-Precision Embedding Using a Cache

Flexible Performant GEMM Kernels on GPUs

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)