Parallel algorithms for problems of cluster analysis with very large amount of data
Al-Farabi Kazakh National University, Almaty, Kazakhstan
arXiv:1402.3789 [cs.DC], (16 Feb 2014)
@article{2014arXiv1402.3789L,
author={Litvinenko}, N.},
title={"{Parallel algorithms for problems of cluster analysis with very large amount of data}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1402.3789},
primaryClass={"cs.DC"},
keywords={Computer Science – Distributed, Parallel, and Cluster Computing, 91C20, 68W10, 62-07, D.1.3, G.1.0, G.4, H.3.3, I.5.3},
year={2014},
month={feb},
adsurl={http://adsabs.harvard.edu/abs/2014arXiv1402.3789L},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used at the top level and graphic processors for massive computing. For solving problems of cluster analysis on GPUs the nearest neighbor method (NNM) is developed. This algorithm allows us to handle up to 2 millions records with number of features up to 25. Since sequential and parallel algorithms are fundamentally different, it is difficult to compare the computation times. However, some comparisons are made. The gain in the computing time is about 10 times. We plan to increase this factor up to 50-100 after fine tuning of algorithms.
February 19, 2014 by hgpu