high performance computing on graphics processing units: hgpu.org

hgpu.org » AMD Radeon Instinct MI300X

Omniwise: Predicting GPU Kernels Performance with LLMs

Zixian Wang, Cole Ramos, Muhammad A. Awad, Keith Lowery

View

Tags: AMD, AMD Radeon Instinct MI250, AMD Radeon Instinct MI300X, Artificial intelligence, Benchmarking, Computer science, LLM, Neural networks, Performance, ROCm

June 29, 2025 by hgpu

Engineering Supercomputing Platforms for Biomolecular Applications

Robert Welch, Charles Laughton, Oliver Henrich, Tom Burnley, Daniel Cole, Alan Real, Sarah Harris, James Gebbie-Rayet

View

Download (PDF)

Source codes

Tags: AMD Radeon Instinct MI250X, AMD Radeon Instinct MI300X, ATI, Benchmarking, Biology, Biomolecules, Computational biology, CUDA, HPC, Molecular dynamics, nVidia, nVidia A100, nVidia GH200, nVidia H100, Package, Physics, ROCm, Tesla V100

June 22, 2025 by hgpu

FLASH: Fast All-to-All Communication in GPU Clusters

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi

View

Download (PDF)

Tags: AMD Radeon Instinct MI300X, ATI, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia A100, nVidia B200, nVidia H100

May 25, 2025 by hgpu

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang

View

Download (PDF)

Source codes

Tags: AI, AMD Radeon Instinct MI300X, ATI, Computer science, CUDA, Heterogeneous systems, HIP, nVidia, nVidia A100, nVidia H100, Package

April 27, 2025 by hgpu

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath

View

Download (PDF)

Source codes

Tags: AI, AMD Radeon Instinct MI250, AMD Radeon Instinct MI300X, Artificial intelligence, ATI, Benchmarking, Computer science, CUDA, LLM, Machine learning, nVidia, nVidia A100, nVidia GH200, nVidia H100, OpenCL, Performance

November 10, 2024 by hgpu

* * *

high performance computing on graphics processing units: hgpu.org

Omniwise: Predicting GPU Kernels Performance with LLMs

Engineering Supercomputing Platforms for Biomolecular Applications

FLASH: Fast All-to-All Communication in GPU Clusters

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)