high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia L4

ML Inference Scheduling with Predictable Latency

Haidong Zhao, Nikolaos Georgantas

View

Download (PDF)

Tags: Computer science, Machine learning, nVidia, nVidia L4, Task scheduling

December 21, 2025 by hgpu

Forecasting time series with constraints

Nathan Doumèche, Francis Bach, Éloi Bedek, Gérard Biau, Claire Boyer, Yannig Goude

View

Download (PDF)

Source codes

Tags: AI, Benchmarking, Computer science, Linear Algebra, Machine learning, nVidia, nVidia L4, Package

February 24, 2025 by hgpu

gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs

Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt

View

Download (PDF)

Source codes

Tags: Bioinformatics, Biology, Computer science, CUDA, FPGA, Genomics, nVidia, nVidia A100, nVidia H100, nVidia L4, nVidia L40s, nVidia V100, Package

November 24, 2024 by hgpu

GenVectorX: A performance-portable SYCL library for Lorentz Vectors operations

Monica Dessole, Jolly Chen, Axel Naumann

View

Download (PDF)

Source codes

Tags: CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3060, nVidia L4, oneAPI, Package, Performance, Physics, SYCL

December 10, 2023 by hgpu

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org

ML Inference Scheduling with Predictable Latency

Forecasting time series with constraints

gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs

GenVectorX: A performance-portable SYCL library for Lorentz Vectors operations

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)