high performance computing on graphics processing units: hgpu.org

hgpu.org » AMD Radeon Instinct MI300X

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Jianghui Wang, Vinay Joshi, Saptarshi Majumder, Xu Chao, Bin Ding, Ziqiong Liu, Pratik Prabhanjan Brahma, Dong Li, Zicheng Liu, Emad Barsoum

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI300X, ATI, Benchmarking, Code generation, Computer science, Deep learning, Package, Python, ROCm, Triton

August 3, 2025 by hgpu

Performance Portable Gradient Computations Using Source Transformation

Kim Liegeois, Brian Kelley, Eric Phipps, Sivasankaran Rajamanickam, Vassil Vassilev

View

Download (PDF)

Tags: AMD Radeon Instinct MI300X, ATI, Computer science, CUDA, HIP, Kokkos, Machine learning, Mathematical Software, nVidia, nVidia H100

August 3, 2025 by hgpu

Omniwise: Predicting GPU Kernels Performance with LLMs

Zixian Wang, Cole Ramos, Muhammad A. Awad, Keith Lowery

View

Download (PDF)

Tags: AMD, AMD Radeon Instinct MI250, AMD Radeon Instinct MI300X, Artificial intelligence, Benchmarking, Computer science, LLM, Neural networks, Performance, ROCm

June 29, 2025 by hgpu

Engineering Supercomputing Platforms for Biomolecular Applications

Robert Welch, Charles Laughton, Oliver Henrich, Tom Burnley, Daniel Cole, Alan Real, Sarah Harris, James Gebbie-Rayet

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, AMD Radeon Instinct MI300X, ATI, Benchmarking, Biology, Biomolecules, Computational biology, CUDA, HPC, Molecular dynamics, nVidia, nVidia A100, nVidia GH200, nVidia H100, Package, Physics, ROCm, Tesla V100

June 22, 2025 by hgpu

FLASH: Fast All-to-All Communication in GPU Clusters

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi

View

Download (PDF)

Tags: AMD Radeon Instinct MI300X, ATI, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia A100, nVidia B200, nVidia H100

May 25, 2025 by hgpu

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang

View

Download (PDF)

Source codes

Tags: AI, AMD Radeon Instinct MI300X, ATI, Computer science, CUDA, Heterogeneous systems, HIP, nVidia, nVidia A100, nVidia H100, Package

April 27, 2025 by hgpu

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath

View

Download (PDF)

Source codes

Tags: AI, AMD Radeon Instinct MI250, AMD Radeon Instinct MI300X, Artificial intelligence, ATI, Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia A100, nVidia GH200, nVidia H100, OpenCL, Performance

November 10, 2024 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

Geak: Introducing Triton Kernel AI Agent & Evaluation Benchmarks

Performance Portable Gradient Computations Using Source Transformation

Omniwise: Predicting GPU Kernels Performance with LLMs

Engineering Supercomputing Platforms for Biomolecular Applications

FLASH: Fast All-to-All Communication in GPU Clusters

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)