Parallel Hierarchical Clustering on the GPU

hgpu.org » Programming » Algorithms » Parallel Hierarchical Clustering on the GPU

Parallel Hierarchical Clustering on the GPU

Ursula Reiterer

Leopold-Franzens-University Innsbruck

Leopold-Franzens-University Innsbruck, 2013

@article{reiterer2014hierarchical,

title={Hierarchical Clustering on the GPU},

author={Reiterer, Ursula},

year={2014}

}

Download (PDF)

View

Source

1903

views

Clustering is a basic task in exploratory data analysis. It is used to partition elements of a set into disjoint groups, so-called clusters, such that elements within a group are similar to each other, but dissimilar to elements of other groups. Several clustering algorithms exist, which can be applied depending on the type of dataset and the particular purpose. However, clustering large datasets can be computationally expensive and requires techniques to increase the performance of these algorithms. To achieve this, recent research often focuses on the massively parallelism provided by Graphics Processing Units (GPUs). In this thesis, we present three clustering algorithms: k-means, k-medoids and hierarchical clustering. Using OpenCL as programming model, we implemented a completely parallel k-means, which reduces the data movement between CPU and GPU and optimizes the parallel assignment step using the GPU’s local memory. Similarly, we designed a parallel k-medoids, which executes all basic clustering steps on the GPU and uses a tiling approach for the computation of the new medoids. Finally, we focused on divisive hierarchical clustering. We first implemented a standard approach, which processes each cluster separately in parallel and continues the clustering sequentially on the CPU as soon as the clusters become too small to make efficient use of the GPU. Then, we introduced a new approach, which increases the parallelism by applying k-means concurrently on all data segments of the same hierarchy depth, using a segmented reduction for the computation of the new means. We further optimized this segmented reduction by injecting PTX assembly instructions in the kernels. Our results show that our segmented approach is 2-3 times faster than the classical parallel approach and 4-5 times faster than the sequential version.

Tags: Algorithms, Clustering, Computer science, nVidia, nVidia GeForce GT 650 M, OpenCL, PTX, Thesis

September 20, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

* * *

high performance computing on graphics processing units: hgpu.org