high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel Hierarchical Clustering on the GPU

Parallel Hierarchical Clustering on the GPU

Ursula Reiterer

Leopold-Franzens-University Innsbruck

Leopold-Franzens-University Innsbruck, 2013

@article{reiterer2014hierarchical,

title={Hierarchical Clustering on the GPU},

author={Reiterer, Ursula},

year={2014}

}

Download (PDF)

View

Source

2438

views

Clustering is a basic task in exploratory data analysis. It is used to partition elements of a set into disjoint groups, so-called clusters, such that elements within a group are similar to each other, but dissimilar to elements of other groups. Several clustering algorithms exist, which can be applied depending on the type of dataset and the particular purpose. However, clustering large datasets can be computationally expensive and requires techniques to increase the performance of these algorithms. To achieve this, recent research often focuses on the massively parallelism provided by Graphics Processing Units (GPUs). In this thesis, we present three clustering algorithms: k-means, k-medoids and hierarchical clustering. Using OpenCL as programming model, we implemented a completely parallel k-means, which reduces the data movement between CPU and GPU and optimizes the parallel assignment step using the GPU’s local memory. Similarly, we designed a parallel k-medoids, which executes all basic clustering steps on the GPU and uses a tiling approach for the computation of the new medoids. Finally, we focused on divisive hierarchical clustering. We first implemented a standard approach, which processes each cluster separately in parallel and continues the clustering sequentially on the CPU as soon as the clusters become too small to make efficient use of the GPU. Then, we introduced a new approach, which increases the parallelism by applying k-means concurrently on all data segments of the same hierarchy depth, using a segmented reduction for the computation of the new means. We further optimized this segmented reduction by injecting PTX assembly instructions in the kernels. Our results show that our segmented approach is 2-3 times faster than the classical parallel approach and 4-5 times faster than the sequential version.

Tags: Algorithms, Clustering, Computer science, nVidia, nVidia GeForce GT 650 M, OpenCL, PTX, Thesis

September 20, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallel Hierarchical Clustering on the GPU

Your response

Recent source codes

Kernel Library for LLM Serving

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Genten: Software for Generalized Tensor Decompositions by Sandia National Laboratories

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

Pinocchio: PINpointing Orbit Crossing Collapsed Hierarchical Objects

KernelCoder: trained on a curated dataset of reasoning traces and CUDA kernel pairs

VibeCodeHPC - Multi Agentic Vibe Coding for HPC

Compile-Time Resource Safety for GPU APIs: A Low-Overhead Typestate Framework

exa-AMD: Exascale Accelerated Materials Discovery

Most viewed papers (last 30 days)

Parallel Hierarchical Clustering on the GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)