Fast and approximate stream mining of quantiles and frequencies using graphics processors
University of North Carolina at Chapel Hill
In SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data (2005), pp. 611-622
@conference{govindaraju2005fast,
title={Fast and approximate stream mining of quantiles and frequencies using graphics processors},
author={Govindaraju, N.K. and Raghuvanshi, N. and Manocha, D.},
booktitle={Proceedings of the 2005 ACM SIGMOD international conference on Management of data},
pages={611–622},
isbn={1595930604},
year={2005},
organization={ACM}
}
We present algorithms for fast quantile and frequency estimation in large data streams using graphics processors (GPUs). We exploit the high computation power and memory bandwidth of graphics processors and present a new sorting algorithm that performs rasterization operations on the GPUs. We use sorting as the main computational component for histogram approximation and construction of e-approximate quantile and frequency summaries. Our algorithms for numerical statistics computation on data streams are deterministic, applicable to fixed or variable-sized sliding windows and use a limited memory footprint. We use GPU as a co-processor and minimize the data transmission between the CPU and GPU by taking into account the low bus bandwidth. We implemented our algorithms on a PC with a NVIDIA GeForce FX 6800 Ultra GPU and a 3.4 GHz Pentium IV CPU and applied them to large data streams consisting of more than 100 million values. We also compared the performance of our GPU-based algorithms with optimized implementations of prior CPU-based algorithms. Overall, our results demonstrate that the graphics processors available on a commodity computer system are efficient stream-processor and useful co-processors for mining data streams.
November 30, 2010 by hgpu