high performance computing on graphics processing units: hgpu.org

hgpu.org » nVidia A100

Modeling Parallel Programs using Large Language Models

Daniel Nichols, Aniruddha Marathe, Harshitha Menon, Todd Gamblin, Abhinav Bhatele

View

Tags: Code generation, Computer science, HPC, MPI, nVidia, nVidia A100, OpenMP

July 9, 2023 by hgpu

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates

View

Tags: Algorithms, Cluster analysis, Clustering, Computer science, CUDA, Hierarchical clustering, Machine learning, Nearest neighbour, nVidia, nVidia A100, nVidia DGX-1, Package

July 2, 2023 by hgpu

SYCL compute kernels for ExaHyPE

Chung Ming Loi, Tobias Weinzierl

View

Tags: Benchmarking, Computer science, Data parallelism, nVidia, nVidia A100, Package, Programming techniques, SYCL

July 2, 2023 by hgpu

DGEMM on Integer Matrix Multiplication Unit

Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota

View

Tags: Computer science, CUBLAS, CUDA, Deep learning, Linear Algebra, Machine learning, Matrix multiplication, nVidia, nVidia A100, nVidia Jetson AGX Orin, nVidia RTX 6000 Ada, nVidia Titan RTX, Package

June 25, 2023 by hgpu

GPU First – Execution of Legacy CPU Codes on GPUs

Shilei Tian, Tom Scogland, Barbara Chapman, Johannes Doerfert

View

Tags: Benchmarking, Compilers, Computer science, CUDA, Heterogeneous systems, nVidia, nVidia A100, OpenMP

June 25, 2023 by hgpu

Improving Performance of Iterative Applications through Interleaved Execution of Approximated CUDA Kernels

Gabriel Freytag

View

Tags: Computer science, CUDA, Machine learning, Mixed precision, Neural networks, nVidia, nVidia A100, nVidia P100, Thesis

June 18, 2023 by hgpu

SIMULATeQCD: A simple multi-GPU lattice code for QCD calculations

Lukas Mazur, Dennis Bollweg, David A. Clarke, Luis Altenkort, Olaf Kaczmarek, Rasmus Larsen, Hai-Tao Shu, Jishnu Goswami, Philipp Scior, Hauke Sandmeyer, Marius Neumann, Henrik Dick, Sajid Ali, Jangho Kim, Christian Schmidt, Peter Petreczky, Swagato Mukherjee

View

Tags: Algorithms, AMD Radeon Instinct MI250X, ATI, CUDA, High Energy Physics - Lattice, HIP, MPI, nVidia, nVidia A100, Package, Physics, QCD

June 11, 2023 by hgpu

Hybrid CPU/GPU/APU accelerated query, insert, update and erase operations in hash tables with string keys

Tobias Groth, Sven Groppe, Thilo Pionteck, Franz Valdiek, Martin Koppehel

View

Tags: APU, Computer science, CUDA, Hashing, nVidia A100, oneAPI, SYCL

June 4, 2023 by hgpu

Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study

Leonardo Solis-Vasquez, Edward Mascarenhas, Andreas Koch

View

Tags: Chemistry, Computer science, CUDA, molecular docking, nVidia, nVidia A100, oneAPI, Package, SYCL

May 28, 2023 by hgpu

Communication-minimizing Asynchronous Tensor Parallelism

Siddharth Singh, Zack Sating, Abhinav Bhatele

View

Tags: Computer science, CUDA, GPU cluster, Neural networks, nVidia, nVidia A100

May 28, 2023 by hgpu

Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs

Igor Sfiligoi, Emily A. Belli, Jeff Candy, Reuben D. Budiardja

View

Tags: AMD Radeon Instinct MI250X, ATI, Computer science, Fortran, MPI, nVidia, nVidia A100, OpenACC, Performance

May 21, 2023 by hgpu

TorchBench: Benchmarking PyTorch with High API Surface Coverage

Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu

View

Tags: AMD Radeon Instinct MI210, ATI, Benchmarking, Computer science, Deep learning, nVidia, nVidia A100, Package, PyTorch

May 14, 2023 by hgpu

CUDAnalyst (CUDA + Analyst)

Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation

CodegenBench

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

Analyzing the Impact of Kernel Fusion on GPU Tensor Operation Performance: A Systematic Performance Study

IntelliKit: Agent-first tooling for AMD hardware

Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs

DITRON: Distributed Compiler based on Triton for Parallel Systems

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CuTile Benchmark Suite: Performance and Productivity Tradeoffs for GPU Kernel Programming on Blackwell Architecture

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Agentic Code Optimization via Compiler-LLM Cooperation

Agentic Code Optimization via Compiler-LLM Cooperation

Device Virtual Machine (DVM)

DVM: Real-Time Kernel Generation for Dynamic AI Models

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

See all packages

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Login | Sitemap | Feedback | Policy

Contact us: