Ensemble K-Means on Modern Many Core Hardware
University of Florida
University of Florida, 2011
@phdthesis{ravunnikutty2011ensemble,
title={ENSEMBLE K-MEANS ON MODERN MANY CORE HARDWARE},
author={RAVUNNIKUTTY, G.},
year={2011},
school={UNIVERSITY OF FLORIDA}
}
Clustering involves partitioning a set of objects into subsets called clusters so that objects in the same cluster are similar according to some metric. Clustering is widely used in many fields like machine learning, data mining, pattern recognition and bioinformatics. K-means algorithm is the most popular algorithm used for clustering which uses distance as the similarity metric. K-means algorithm works by choosing a set of K random objects (model) from the data set and performs distance computation. Ensemble K-means chooses multiple models with the goal of refinement and faster convergence. The focus of the thesis is to investigate different approaches to parallelize the ensemble based K-means algorithms on modern many core hardware. The many core hardware involves GPUs and CPUs backed by the OpenCL software stack. This thesis also presents novel implementations that exploit the immense computing power of the hardware by minimizing the amount of data access.
July 28, 2012 by hgpu