hgpu.org » AMD Radeon Instinct MI350X
Gang Liao, Hongsen Qin, Ying Wang, Alicia Golden, Michael Kuchnik, Yavuz Yetim, Jia Jiunn Ang, Chunli Fu, Yihan He, Samuel Hsia, Zewei Jiang, Dianshi Li, Uladzimir Pashkevich, Varna Puvvada, Feng Shi, Matt Steiner, Ruichao Xiao, Nathan Yan, Xiayu Yu, Zhou Fang, Abdul Zainul-Abedin, Ketan Singh, Hongtao Yu, Wenyuan Chi, Barney Huang, Sean Zhang, Noah Weller, Zach Marine, Wyatt Cook, Carole-Jean Wu, Gaoxiang Liu
Tags: AI, AMD Radeon Instinct MI300X, AMD Radeon Instinct MI350X, ATI, Computer science, CUDA, Deep learning, Heterogeneous systems, LLM, nVidia, nVidia A100, nVidia H100, PTX, ROCm, Triton
January 4, 2026 by hgpu
Ryan Swann, Muhammad Osama, Xiaohu Guo, Bryant Nelson, Lixun Zhang, Alex Brown, Yen Ong, Ali Yazdani, Sean Siddens, Ganesh Dasika, Alex Underwood
Tags: AMD, AMD Radeon Instinct MI300X, AMD Radeon Instinct MI350X, ATI, BLAS, Computer science, HPC, Package, Performance, ROCm, Triton
December 7, 2025 by hgpu
Recent source codes
* * *
Most viewed papers (last 30 days)
- Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study
- AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
- LLMQ: Efficient Lower-Precision LLM Training for Consumer GPUs
- CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
- DRTriton: Large-Scale Synthetic Data Reinforcement Learning for Triton Kernel Generation
- MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
- Mixed-precision numerics in scientific applications: survey and perspectives
- Triton-Sanitizer: A Fast and Device-Agnostic Memory Sanitizer for Triton with Rich Diagnostic Context
- SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits
- MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
* * *




