An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs

hgpu.org » Applications » Computer science » An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs

An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs

Bala Gurumurthy, David Broneske, Martin Schaler, Thilo Pionteck, Gunter Saake

University of Magdeburg

Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems (HardBD), 2021

BibTeX

Download (PDF)

View

Source

1499

views

Using heterogeneous processing devices, like GPUs, to accelerate relational database operations is a well-known strategy. In this context, the group by operation is highly interesting for two reasons. Firstly, it incurs large processing costs. Secondly, its results (i.e., aggregates) are usually small reducing data movement costs whose compensation is a major challenge for heterogeneous computing. Generally for group by computation on GPUs, one relies either on sorting or hashing. Today, empirical results suggest that hash-based approaches are superior. However by concept, hashing induces an unpredictable memory access pattern being in conflict with the architecture of GPUs. This motivates studying why current sort-based approaches are generally inferior. Our results indicate that current sorting solutions cannot exploit the full parallel power of modern GPUs. Experimentally, we show that the issue arises from the need to synchronize parallel threads that access the shared memory location containing the aggregates via atomics. Our quantification of the optimal performance motivates us to investigate how to minimize the overhead of atomics. This results in different variants using atomics, where the best variants almost mitigate the atomics overhead entirely. The results of a largescale evaluation reveal that our approach achieves a 3x speed-up over existing sort-based approaches and up to 2x speed-up over hash-based approaches.

Tags: Computer science, Databases, Hashing, Heterogeneous systems, nVidia, nVidia GeForce GTX 1050 Ti, nVidia GeForce RTX 2080 Ti, OpenCL, Sorting

April 5, 2021 by hgpu

No votes yet.

Please wait...

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Engineering Supercomputing Platforms for Biomolecular Applications

high performance computing on graphics processing units: hgpu.org