Optimizing GPU-accelerated Group-By and Aggregation

Tomas Karnagel, Rene Mueller, Guy M. Lohman
Technische Universitat Dresden, Dresden, Germany
Sixth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), 2015


   title={Optimizing GPU-accelerated Group-By and Aggregation},

   author={Karnagel, Tomas and Mueller, Rene and Lohman, Guy M.},



Download Download (PDF)   View View   Source Source   



The massive parallelism and faster random memory access of Graphics Processing Units (GPUs) promise to further accelerate complex analytics operations such as joins and grouping, but also provide additional challenges to optimizing their performance. There are more implementation alternatives to consider on the GPU, such as exploiting different types of memory on the device and the division of work among processor clusters and threads, and additional performance parameters, such as the size of the kernel grid and the trade-off between the number of threads and the resulting share of resources each thread will get. In this paper, we study in depth offloading to a GPU the grouping and aggregation operator, often the dominant operation in analytics queries after joins. We primarily focus on the design implications of a hash-based implementation, although we also compare it against a sort-based approach. Our study provides (1) a detailed performance analysis of grouping and aggregation on the GPU as the number of groups in the result varies, (2) an analysis of the truncation effects of hash functions commonly used in hashbased grouping, and (3) a simple parametric model for a wide range of workloads with a heuristic optimizer to automatically pick the best implementation and performance parameters at execution time.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: