Ameliorating Memory Contention of OLAP operators on GPU Processors
Dept. of Computer Science, Columbia University
Eighth International Workshop on Data Management on New Hardware (DaMoN ’12), 2012
@inproceedings{sitaridi2012ameliorating,
title={Ameliorating Memory Contention of OLAP operators on GPU Processors},
author={Sitaridi, E.A. and Ross, K.A.},
booktitle={Proceedings of the Eighth International Workshop on Data Management on New Hardware},
pages={39–47},
year={2012},
organization={ACM}
}
Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank. Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.
June 8, 2012 by hgpu