Optimizing GPU-accelerated Group-By and Aggregation
Technische Universitat Dresden, Dresden, Germany
Sixth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), 2015
@article{karnagel2015optimizing,
title={Optimizing GPU-accelerated Group-By and Aggregation},
author={Karnagel, Tomas and Mueller, Rene and Lohman, Guy M.},
year={2015}
}
The massive parallelism and faster random memory access of Graphics Processing Units (GPUs) promise to further accelerate complex analytics operations such as joins and grouping, but also provide additional challenges to optimizing their performance. There are more implementation alternatives to consider on the GPU, such as exploiting different types of memory on the device and the division of work among processor clusters and threads, and additional performance parameters, such as the size of the kernel grid and the trade-off between the number of threads and the resulting share of resources each thread will get. In this paper, we study in depth offloading to a GPU the grouping and aggregation operator, often the dominant operation in analytics queries after joins. We primarily focus on the design implications of a hash-based implementation, although we also compare it against a sort-based approach. Our study provides (1) a detailed performance analysis of grouping and aggregation on the GPU as the number of groups in the result varies, (2) an analysis of the truncation effects of hash functions commonly used in hashbased grouping, and (3) a simple parametric model for a wide range of workloads with a heuristic optimizer to automatically pick the best implementation and performance parameters at execution time.
October 6, 2015 by hgpu