Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism
Intelligent Computing Lab, Division of Informatics, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, P.R. China
14th International Conference on Intelligent Data Engineering and Automated Learning, 2013
@article{dong2013accelerating,
title={Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism},
author={Dong, Jianqiang and Wang, Fei and Yuan, Bo},
year={2013}
}
In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature of the latest CUDA platform can bring significant benefits to BIRCH, one of the most well-known clustering techniques for streaming data. Experiment results show that, on a number of benchmark problems, the GPU accelerated BIRCH can be made up to 154 times faster than the CPU version with good scalability and high accuracy. Our work suggests that massively parallel GPU computing is a promising and effective solution to the challenges of big data.
August 1, 2013 by hgpu