high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia GeForce Titan

Real-Time Dedispersion for Fast Radio Transient Surveys, using Auto Tuning on Many-Core Accelerators

Alessio Sclocco, Joeri van Leeuwen, Henri E. Bal, Rob V. van Nieuwpoort

View

Tags: Astrophysics, ATI, ATI Radeon HD 7970, Instrumentation and Methods for Astrophysics, Intel Xeon Phi, nVidia, nVidia GeForce GTX 680, nVidia GeForce Titan, OpenCL, OpenMP, Package, Tesla K20

January 12, 2016 by hgpu

Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions

Guy L. Steele Jr. (Oracle Labs), Jean-Baptiste Tristan

View

Tags: Computer science, CUDA, Latent Dirichlet allocation, Machine learning, nVidia, nVidia GeForce Titan

May 16, 2015 by hgpu

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs

True 4-Bit Quantized CNN Training on CPU

True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

Hunting CUDA Bugs at Scale with cuFuzz

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: