Gallatin: A General-Purpose GPU Memory Manager
University of Utah, USA
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2024
@article{mccoy2024gallatin,
title={Gallatin: A General-Purpose GPU Memory Manager},
author={McCoy, Hunter and Pandey, Prashant},
year={2024}
}
Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory objects. They build specialized pipelines to achieve performance for a fixed set of allocation sizes and fall back to the CUDA allocator for allocating large sizes. In the process, they lose general-purpose usability and fail to support critical applications such as streaming graph processing. In this paper, we introduce Gallatin, a general-purpose and high-performance GPU memory manager. Gallatin uses the van Emde Boas (vEB) tree data structure to manage memory objects efficiently and supports allocations of any size. Furthermore, we develop a highly-concurrent GPU-implementation of the vEB tree which can be broadly used in other GPU applications. It supports constant time insertions, deletions, and successor operations for a given memory size. In our evaluation, we compare Gallatin with state-of-the-art specialized allocator variants. Gallatin is up to 374x faster on single-sized allocations and up to 264x faster on mixed-size allocations than the next-best allocator. In scalability benchmarks, Gallatin is up to 254x times faster than the next-best allocator as the number of threads increases. For the graph benchmarks, Gallatin is 1.5x faster than the state-of-the-art for bulk insertions, slightly faster for bulk deletions, and is 3x faster than the next-best allocator for all graph expansion tests.
February 4, 2024 by hgpu