high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia GeFofce GTX Titan X

TensorFlow: A system for large-scale machine learning

Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, Xiaoqiang Zhang

View

Tags: Artificial intelligence, Computer science, CUDA, Deep learning, Heterogeneous systems, Machine learning, Neural networks, nVidia, nVidia GeFofce GTX Titan X, Package, Tesla K40

May 30, 2016 by hgpu

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

AutoKernel: Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context

LLM.Q: Quantized LLM training in pure CUDA/C++

LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs

True 4-Bit Quantized CNN Training on CPU

True 4-Bit Quantized Convolutional Neural Network Training on CPU: Achieving Full-Precision Parity

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

Hunting CUDA Bugs at Scale with cuFuzz

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: