high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia H100

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Kunming Zhang, Hanlong Liao, Guoming Tang

View

Tags: Computer science, GPU cluster, Heterogeneous systems, Machine learning, nVidia, nVidia A800, nVidia GeForce RTX 4090, nVidia H100, nVidia RTX A6000, nVidia V100

June 22, 2025 by hgpu

Engineering Supercomputing Platforms for Biomolecular Applications

Robert Welch, Charles Laughton, Oliver Henrich, Tom Burnley, Daniel Cole, Alan Real, Sarah Harris, James Gebbie-Rayet

View

Tags: AMD Radeon Instinct MI250X, AMD Radeon Instinct MI300X, ATI, Benchmarking, Biology, Biomolecules, Computational biology, CUDA, HPC, Molecular dynamics, nVidia, nVidia A100, nVidia GH200, nVidia H100, Package, Physics, ROCm, Tesla V100

June 22, 2025 by hgpu

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

Paul Fuchs, Weilong Chen, Stephan Thaler, Julija Zavadlav

View

Tags: Chemistry, Computational Physics, Computer science, CUDA, Machine learning, Molecular dynamics, Neural networks, nVidia, nVidia A100, nVidia GH200, nVidia H100, Package, Physics

June 15, 2025 by hgpu

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning

Yong-Cheng Liaw, Shuo-Han Chen

View

Tags: Artificial intelligence, Benchmarking, Computer science, LLM, Memory, nVidia, nVidia H100, nVidia RTX A5000

June 8, 2025 by hgpu

Performance of Confidential Computing GPUs

Antonio Martínez Ibarra, Julian James Stephen, Aurora González Vidal, K. R. Jayaram, Antonio Fernando Skarmeta Gómez

View

Tags: Computer science, CUDA, LLM, nVidia, nVidia H100, Performance, Security

May 25, 2025 by hgpu

FLASH: Fast All-to-All Communication in GPU Clusters

Yiran Lei, Dongjoo Lee, Liangyu Zhao, Daniar Kurniawan, Chanmyeong Kim, Heetaek Jeong, Changsu Kim, Hyeonseong Choi, Liangcheng Yu, Arvind Krishnamurthy, Justine Sherry, Eriko Nurvitadhi

View

Tags: AMD Radeon Instinct MI300X, ATI, Computer science, GPU cluster, Heterogeneous systems, MPI, nVidia, nVidia A100, nVidia B200, nVidia H100

May 25, 2025 by hgpu

MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications

Aashaka Shah, Abhinav Jangda, Binyang Li, Caio Rocha, Changho Hwang, Jithin Jose, Madan Musuvathi, Olli Saarikivi, Peng Cheng, Qinghua Zhou, Roshan Dathathri, Saeed Maleki, Ziyue Yang

View

Tags: AI, AMD Radeon Instinct MI300X, ATI, Computer science, CUDA, Heterogeneous systems, HIP, nVidia, nVidia A100, nVidia H100, Package

April 27, 2025 by hgpu

LithOS: An Operating System for Efficient Machine Learning on GPUs

Patrick H. Coppock, Brian Zhang, Eliot H. Solomon, Vasilis Kypriotis, Leon Yang, Bikash Sharma, Dan Schatzberg, Todd C. Mowry, Dimitrios Skarlatos

View

Tags: Computer science, CUDA, Machine learning, nVidia, nVidia A100, nVidia H100, Operating systems

April 27, 2025 by hgpu

DeepCompile: A Compiler-Driven Approach to Optimizing Distributed Deep Learning Training

Masahiro Tanaka, Du Li, Umesh Chand, Ali Zafar, Haiying Shen, Olatunji Ruwase

View

Tags: Computer science, CUDA, Deep learning, Distributed computing, nVidia, nVidia H100, Package, Prefetch

April 27, 2025 by hgpu

Efficient allocation of image recognition and LLM tasks on multi-GPU system

Marcin Lawenda, Krzesimir Samborski, Kyrylo Khloponin, Łukasz Szustak

View

Tags: Computer science, CUDA, Data parallelism, Image recognition, LLM, Machine learning, nVidia, nVidia H100, Performance

March 30, 2025 by hgpu

CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads

Radostin Stoyanov, Viktória Spišaková, Jesus Ramos, Steven Gurfinkel, Andrei Vagin, Adrian Reber, Wesley Armour, Rodrigo Bruno

View

Tags: AMD Radeon Instinct MI210, ATI, Computer science, CUDA, Deep learning, nVidia, nVidia A100, nVidia H100, nVidia RTX A6000, Package, ROCm

March 3, 2025 by hgpu

The AI CUDA Engineer: Agentic CUDA Kernel Discovery, Optimization and Composition

Robert Tjarko Lange, Aaditya Prasad, Qi Sun, Maxence Faldor, Yujin Tang, David Ha

View

Tags: AI, Computer science, CUDA, LLM, nVidia, nVidia H100, Package, Performance

February 24, 2025 by hgpu

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

XaaS containers

Acceleration as a Service (XaaS) Source Containers

CASS: Cuda-Amd aSSembly

CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Cluser of smartphones for edge computing application using TensorFlow

Low-cost edge computing using upcycled smartphones

SYCL Container

Exploring SYCL for batched kernels with memory allocations

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Can Large Language Models Predict Parallel Code Performance?

See all packages

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us:

contact@hpgu.org