high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Ameliorating Memory Contention of OLAP operators on GPU Processors

Ameliorating Memory Contention of OLAP operators on GPU Processors

Evangelia A. Sitaridi, Kenneth A. Ross

Dept. of Computer Science, Columbia University

Eighth International Workshop on Data Management on New Hardware (DaMoN ’12), 2012

DOI:10.1145/2236584.2236590

BibTeX

Download (PDF)

View

Source

2413

views

Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank. Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.

Tags: Computer science, CUDA, Databases, nVidia, Tesla C2070

June 8, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Ameliorating Memory Contention of OLAP operators on GPU Processors

Your response

Recent source codes

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

KISim: Kubernetes Intelligent Scheduling Simulator

Efficient GPU Implementation of Multi-Precision Integer Division

exa-AMD: Exascale Accelerated Materials Discovery

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

Most viewed papers (last 30 days)

Ameliorating Memory Contention of OLAP operators on GPU Processors

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)