high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia H100

Portability of Fortran’s ‘do concurrent’ on GPUs

Ronald M. Caplan, Miko M. Stulajter, Jon A. Linker, Jeff Larkin, Henry A. Gabb, Shiquan Su, Ivan Rodriguez, Zachary Tschirhart, Nicholas Malaya

View

Tags: Computer science, Fortran, Intel, Intel Data Center GPU Max 1550, Intel Ponte Vecchio Max 1100, nVidia, nVidia A100, nVidia GH200, nVidia H100, OpenACC, OpenMP, Package

August 18, 2024 by hgpu

Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL

Zheming Jin

View

Tags: Benchmarking, Computer science, CUDA, HIP, Machine learning, Neural networks, nVidia, nVidia GeForce RTX 2080, nVidia GeForce RTX 3090, nVidia H100, oneAPI, Performance, performance portability, SYCL, Tesla A100, Tesla V100

August 14, 2024 by hgpu

Data-driven Forecasting of Deep Learning Performance on GPUs

Seonho Lee, Amar Phanishayee, Divya Mahajan

View

Tags: Computer science, CUDA, Deep learning, nVidia, nVidia A100, nVidia H100, nVidia P100, nVidia V100, Performance, PyTorch, Tesla T4

August 4, 2024 by hgpu

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

Gabin Schieffer, Jacob Wahlgren, Jie Ren, Jennifer Faj, Ivy Peng

View

Tags: Computer science, CUDA, HPC, Memory, nVidia, nVidia H100, Performance, Quantum computing

July 14, 2024 by hgpu

Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services

Ali Doosthosseini, Jonathan Decker, Hendrik Nolte, Julian M. Kunkel

View

Tags: AI, Cloud, Computer science, HPC, LLM, nVidia, nVidia H100, Package, PC cluster

July 7, 2024 by hgpu

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers

Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae

View

Tags: Computer science, CUDA, LLM, Memory, nVidia, nVidia A100, nVidia H100, Performance

June 23, 2024 by hgpu

How much can we gain from Tensor Kernel Fusion on GPUs?

Wei Sun, Ang Li, Sander Stuijk, Henk Corporaal

View

Tags: Computer science, CUDA, Deep learning, Matrix multiplication, Neural networks, nVidia, nVidia A100, nVidia H100

June 16, 2024 by hgpu

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Junjie Li, Yinzhi Wang, Xiao Liang, Hang Liu

View

Tags: BLAS, Chemistry, CUDA, nVidia, nVidia GH200, nVidia H100, Performance, Physics

May 5, 2024 by hgpu

Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP

Suyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya

View

Tags: AMD Radeon Instinct MI210, AMD Radeon Instinct MI300A, ATI, cfd, Computer science, Fluid dynamics, HPC, nVidia, nVidia A100, nVidia H100, OpenMP

May 5, 2024 by hgpu

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan

View

Tags: AMD Radeon Instinct MI100, AMD Radeon Instinct MI250X, ATI, Computer science, Hardware Architecture, HPC, Matrix multiplication, nVidia, nVidia A100, nVidia H100, nVidia V100, PTX

March 10, 2024 by hgpu

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Fabrizio Ferrandi, Serena Curzel, Leandro Fiorin, Daniele Ielmini, Cristina Silvano, Francesco Conti, Alessio Burrello, Francesco Barchi, Luca Benini, Luciano Lavagno, Teodoro Urso, Enrico Calore, Sebastiano Fabio Schifano, Cristian Zambelli, Maurizio Palesi, Giuseppe Ascia, Enrico Russo, Nicola Petra, Davide De Caro, Gennaro Di Meo, Valeria Cardellini, Salvatore Filippone, Francesco Lo Presti, Francesco Silvestri, Paolo Palazzari, Stefania Perri

View

Tags: AI, Artificial intelligence, Computer science, CUDA, Deep learning, Design space exploration, Hardware Architecture, Heterogeneous systems, Machine learning, Neural networks, nVidia, nVidia H100, OpenCL, survey

December 3, 2023 by hgpu

FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification

Wenqing Wu

View

Tags: Computer science, CUDA, nVidia, nVidia A100, nVidia GeForce RTX 3090, nVidia GeForce RTX 4080 Ti, nVidia H100, Task scheduling

November 27, 2023 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: