high performance computing on graphics processing units: hgpu.org

hgpu.org » Memory

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning

Yong-Cheng Liaw, Shuo-Han Chen

View

Download (PDF)

Tags: Artificial intelligence, Benchmarking, Computer science, LLM, Memory, nVidia, nVidia H100, nVidia RTX A5000

June 8, 2025 by hgpu

GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency

Soham Chakraborty, S. Krishna, Andreas Pavlogiannis, Omkar Tuppe

View

Download (PDF)

Tags: Benchmarking, Computer science, CUDA, Memory, nVidia, OpenCL, Programming Languages

June 8, 2025 by hgpu

GPUVM: GPU-driven Unified Virtual Memory

Nurlan Nazaraliyev, Elaheh Sadredini, Nael Abu-Ghazaleh

View

Download (PDF)

Tags: Computer science, CUDA, Memory, nVidia, Operating systems, Performance, Tesla V100

November 17, 2024 by hgpu

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu

View

Download (PDF)

Tags: Artificial intelligence, Computer science, CUDA, LLM, Memory, nVidia, nVidia A10, nVidia H100, Tesla T4

November 10, 2024 by hgpu

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

Reza Abbasi, Sernam Lim

View

Download (PDF)

Source codes

Tags: AI, Artificial intelligence, Computer science, CUDA, LLM, Machine learning, Memory, nVidia, nVidia GeForce RTX 3090, nVidia Quadro RTX 8000, Package

October 20, 2024 by hgpu

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Gabin Schieffer, Ruimin Shi, Stefano Markidis, Andreas Herten, Jennifer Faj, Ivy Peng

View

Download (PDF)

Tags: AMD, AMD Radeon Instinct MI250X, ATI, Computer science, HIP, Machine learning, Memory, Performance

October 6, 2024 by hgpu

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

Gabin Schieffer, Jacob Wahlgren, Jie Ren, Jennifer Faj, Ivy Peng

View

Download (PDF)

Tags: Computer science, CUDA, HPC, Memory, nVidia, nVidia H100, Performance, Quantum computing

July 14, 2024 by hgpu

Towards Unified Analysis of GPU Consistency

Haining Tong, Natalia Gavrilenko, Hernán Ponce de León, Keijo Heljanko

View

Download (PDF)

Source codes

Tags: Computer science, Memory, nVidia, OpenCL, Package, PTX, Vulkan

July 7, 2024 by hgpu

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers

Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae

View

Download (PDF)

Tags: Computer science, CUDA, LLM, Memory, nVidia, nVidia A100, nVidia H100, Performance

June 23, 2024 by hgpu

Gallatin: A General-Purpose GPU Memory Manager

Hunter McCoy, Prashant Pandey

View

Download (PDF)

Source codes

Tags: Computer science, CUDA, HPC, Memory, nVidia, nVidia A40, Package

February 4, 2024 by hgpu

Unified Shared Memory: Friend or Foe?

Juan Fumero Alfonso, Florin-Gabriel Blanaru, Athanasios Stratikopoulos, Steve Dohrmann, Sandhya Viswanathan, Christos-Efthymios Kotselidis

View

Download (PDF)

Tags: Code generation, Computer science, CUDA, FPGA, Heterogeneous systems, Java, Memory, nVidia, nVidia GeForce RTX 3070, OpenCL

September 17, 2023 by hgpu

GGArray: A Dynamically Growable GPU Array

Enzo meneses, Cristóbal A. Navarro, Héctor Ferrada

View

Download (PDF)

Tags: Algorithms, Computer science, CUDA, Memory, nVidia, nVidia A100, nVidia Titan RTX

September 4, 2022 by hgpu

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

No More Shading Languages: Compiling C++ to Vulkan Shaders

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning

GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency

GPUVM: GPU-driven Unified Virtual Memory

NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference

Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

Towards Unified Analysis of GPU Consistency

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers

Gallatin: A General-Purpose GPU Memory Manager

Unified Shared Memory: Friend or Foe?

GGArray: A Dynamically Growable GPU Array

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)