Gallatin: A General-Purpose GPU Memory Manager

hgpu.org » Applications » Computer science » Gallatin: A General-Purpose GPU Memory Manager

Gallatin: A General-Purpose GPU Memory Manager

Hunter McCoy, Prashant Pandey

University of Utah, USA

ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2024

BibTeX

Download (PDF)

View

Source

Source codes

Package:

Gallatin: a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels

1049

views

Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory objects. They build specialized pipelines to achieve performance for a fixed set of allocation sizes and fall back to the CUDA allocator for allocating large sizes. In the process, they lose general-purpose usability and fail to support critical applications such as streaming graph processing. In this paper, we introduce Gallatin, a general-purpose and high-performance GPU memory manager. Gallatin uses the van Emde Boas (vEB) tree data structure to manage memory objects efficiently and supports allocations of any size. Furthermore, we develop a highly-concurrent GPU-implementation of the vEB tree which can be broadly used in other GPU applications. It supports constant time insertions, deletions, and successor operations for a given memory size. In our evaluation, we compare Gallatin with state-of-the-art specialized allocator variants. Gallatin is up to 374x faster on single-sized allocations and up to 264x faster on mixed-size allocations than the next-best allocator. In scalability benchmarks, Gallatin is up to 254x times faster than the next-best allocator as the number of threads increases. For the graph benchmarks, Gallatin is 1.5x faster than the state-of-the-art for bulk insertions, slightly faster for bulk deletions, and is 3x faster than the next-best allocator for all graph expansion tests.

Tags: Computer science, CUDA, HPC, Memory, nVidia, nVidia A40, Package

February 4, 2024 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org