high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Optimizing GPU-accelerated Group-By and Aggregation

Optimizing GPU-accelerated Group-By and Aggregation

Tomas Karnagel, Rene Mueller, Guy M. Lohman

Technische Universitat Dresden, Dresden, Germany

Sixth International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), 2015

@article{karnagel2015optimizing,

title={Optimizing GPU-accelerated Group-By and Aggregation},

author={Karnagel, Tomas and Mueller, Rene and Lohman, Guy M.},

year={2015}

}

Download (PDF)

View

Source

1575

views

The massive parallelism and faster random memory access of Graphics Processing Units (GPUs) promise to further accelerate complex analytics operations such as joins and grouping, but also provide additional challenges to optimizing their performance. There are more implementation alternatives to consider on the GPU, such as exploiting different types of memory on the device and the division of work among processor clusters and threads, and additional performance parameters, such as the size of the kernel grid and the trade-off between the number of threads and the resulting share of resources each thread will get. In this paper, we study in depth offloading to a GPU the grouping and aggregation operator, often the dominant operation in analytics queries after joins. We primarily focus on the design implications of a hash-based implementation, although we also compare it against a sort-based approach. Our study provides (1) a detailed performance analysis of grouping and aggregation on the GPU as the number of groups in the result varies, (2) an analysis of the truncation effects of hash functions commonly used in hashbased grouping, and (3) a simple parametric model for a wide range of workloads with a heuristic optimizer to automatically pick the best implementation and performance parameters at execution time.

Tags: Computer science, CUDA, Databases, nVidia, nVidia GeForce GTX Titan

October 6, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Optimizing GPU-accelerated Group-By and Aggregation

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Optimizing GPU-accelerated Group-By and Aggregation

Share this:

Recent source codes

Most viewed papers (last 30 days)