{"id":30158,"date":"2025-08-31T15:36:22","date_gmt":"2025-08-31T12:36:22","guid":{"rendered":"https:\/\/hgpu.org\/?p=30158"},"modified":"2025-08-31T15:36:22","modified_gmt":"2025-08-31T12:36:22","slug":"scaling-gpu-accelerated-databases-beyond-gpu-memory-size","status":"publish","type":"post","link":"https:\/\/hgpu.org\/?p=30158","title":{"rendered":"Scaling GPU-Accelerated Databases beyond GPU Memory Size"},"content":{"rendered":"<p>There has been considerable interest in leveraging GPUs\u2019 computational power and high memory bandwidth for analytical database workloads. However, their limited memory capacity remains a fundamental limitation for databases whose sizes far exceed the GPU memory size. This challenge is exacerbated by the slow PCIe data transfer speed, that creates a bottleneck in overall system performance. In this work, we introduce a hybrid CPU-GPU query processing strategy that leverages the distinct strengths of CPU and GPU to alleviate the data transfer bottleneck. Our approach performs highly e\ufb03cient data \ufb01ltering on the CPU, which substantially reduces the volume of data transferred to the GPU via PCIe, and o\ufb04oads compute-intensive operators such as joins to the GPU for further processing. Our evaluation on the TPC-H benchmark at scale factors up to 1000 (1TB), using a single A100 GPU with 80GB memory, demonstrates that our approach can e\ufb00ectively handle datasets signi\ufb01cantly larger than the GPU memory size. Moreover, it substantially outperforms a state-of-the-art CPU-only database system in both performance and cost-e\ufb00ectiveness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There has been considerable interest in leveraging GPUs\u2019 computational power and high memory bandwidth for analytical database workloads. However, their limited memory capacity remains a fundamental limitation for databases whose sizes far exceed the GPU memory size. This challenge is exacerbated by the slow PCIe data transfer speed, that creates a bottleneck in overall system [&hellip;]<\/p>\n","protected":false},"author":351,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[11,3],"tags":[451,1782,667,884,20,2066,2132],"class_list":["post-30158","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-paper","tag-benchmarking","tag-computer-science","tag-databases","tag-memory","tag-nvidia","tag-nvidia-a100","tag-nvidia-h100"],"views":1602,"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30158","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/users\/351"}],"replies":[{"embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30158"}],"version-history":[{"count":0,"href":"https:\/\/hgpu.org\/index.php?rest_route=\/wp\/v2\/posts\/30158\/revisions"}],"wp:attachment":[{"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hgpu.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}