hgpu.org » Triton
Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum
Tags: AMD Radeon Instinct MI300X, ATI, Benchmarking, Code generation, Computer science, Deep learning, Package, Python, ROCm, Triton
August 3, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Compiler and Runtime Systems for Generative AI Models
- Scalable GPU-Based Integrity Verification for Large Machine Learning Models
- STARK: Strategic Team of Agents for Refining Kernels
- CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
- Tutoring LLM into a Better CUDA Optimizer
- INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats
- Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs
- Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation
- Collective Communication for 100k+ GPUs
- Enhancing Transformer Performance and Portability through Auto-tuning Frameworks
* * *




