hgpu.org » Performance
Victor W. Lee,Changkyu Kim,Jatin Chhugani,Michael Deisher,Daehyun Kim,Anthony D. Nguyen,Nadathur Satish,Mikhail Smelyanskiy,Srinivas Chennupaty,Per Hammarlund,Ronak Singhal,Pradeep Dubey
October 27, 2010 by hgpu
Richard Vuduc, Aparna Chandramowlishwaran, Jee Choi, Murat Guney, Aashay Shringarpure
October 27, 2010 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- KernelFoundry: Hardware-aware evolutionary GPU kernel optimization
* * *



