Efficient Weighted Histogramming on GPUs with CUDA
Microsoft Research Asia, Beijing, China
Microsoft Research Asia, Tech report, 2012
@article{xu2012efficient,
title={Efficient Weighted Histogramming on GPUs with CUDA},
author={Xu, M. and Xu, N. and Zhao, C. and Hsu, F.H.},
year={2012}
}
The histogram is a fundamental statistical tool that has been extensively used in various domains. In data mining and machine learning applications, weighted histogram calculation often serves as a key component in the processing of their massive data sets. However, the atomic operation, which is introduced to resolve the collisions in GPU-based parallel histogramming with large number of bins, brings the overhead of instruction serialization and limits the performance and performance predictability. In this work, we present a new method for histogramming on GPUs, which reduces the collision intensity by rearranging the input, and provides predictable performance over data sets with different statistics. Using the shared memory effectively, our method shows improved performance over the state-of-the-art implementations. According to the number of bins and sparseness of the values, we then propose a hybrid method which dynamically chooses the best implementation from traditional methods and the new method. An overall speedup of 13x is observed on a data set from a commercial search engine when comparing with the CPU implementation.
January 8, 2013 by hgpu