high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Zhonggen Li, Xiangyu Ke, Yifan Zhu, Yunjun Gao, Feifei Li

Zhejiang University Hangzhou, China

arXiv:2505.09258 [cs.DC], (15 May 2025)

DOI:10.48550/arXiv.2505.09258

@misc{li2025efficientgraphembeddingscale,

title={Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration},

author={Zhonggen Li and Xiangyu Ke and Yifan Zhu and Yunjun Gao and Feifei Li},

year={2025},

eprint={2505.09258},

archivePrefix={arXiv},

primaryClass={cs.DC},

url={https://arxiv.org/abs/2505.09258}

}

Download (PDF)

View

Source

Source codes

Package:

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

1414

views

Graph embeddings provide continuous vector representations of nodes in a graph, which are widely applicable in community detection, recommendations, and various scientific fields. However, existing graph embedding systems either face scalability challenges due to the high cost of RAM and multiple GPUs, or rely on disk storage at the expense of I/O efficiency. In this paper, we propose Legend, a lightweight heterogeneous system for graph embedding that systematically redefines data management across CPU, GPU, and NVMe SSD resources. Legend is built on a foundation of efficient data placement and retrieval strategies tailored to the unique strengths of each hardware. Key innovations include a prefetch-friendly embedding loading strategy, enabling GPUs to directly prefetch data from SSDs with minimal I/O overhead, and a high-throughput GPU-SSD direct access driver optimized for graph embedding tasks. Furthermore, we propose a customized parallel execution strategy to maximize GPU utilization, ensuring efficient handling of billion-scale datasets. Extensive experiments demonstrate that Legend achieves up to 4.8x speedup compared to state-of-the-art systems. Moreover, Legend exhibits comparable performance on a single GPU to that of the state-of-the-art system using 4 GPUs on the billion-scale dataset.

Tags: Computer science, CUDA, Heterogeneous systems, nVidia, nVidia A100, Package, Performance, Prefetch

May 18, 2025 by hgpu

No votes yet.

Please wait...