Progressive Clustering of Big Data with GPU Acceleration and Visualization
Visual Analytics and Imaging Lab, Computer Science Department, Stony Brook University, Stony Brook, NY, USA
New York Scientific Data Summit (NYSDS), 2017
@article{wang2017progressive,
title={Progressive Clustering of Big Data with GPU Acceleration and Visualization},
author={Wang, Jun and Papenhausen, Eric and Wang, Bing and Ha, Sungsoo and Zelenyuk, Alla and Mueller, Klaus},
year={2017}
}
Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.
September 9, 2017 by hgpu