high performance computing on graphics processing units: hgpu.org

hgpu.org » Latency

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

Hamdy Abdelkhalik, Yehia Arafa, Nandakishore Santhi, Abdel-Hameed Badawy

View

Tags: Benchmarking, Computer science, Hardware Architecture, Latency, nVidia, nVidia A100, Performance, PTX

August 28, 2022 by hgpu

Performance analysis of matrix-free conjugate gradient kernels using SYCL

Igor Baratta, Chris Richardson, Garth Wells

View

Tags: Benchmarking, Computer science, Latency, nVidia, nVidia A100, OpenCL, Package, Performance, SYCL

August 21, 2022 by hgpu

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs

True 4-Bit Quantized CNN Training on CPU

True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

Hunting CUDA Bugs at Scale with cuFuzz

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: