CLUEstering: a high-performance density-based clustering library for scientific computing
University of Bologna
University of Bologna, 2024
@article{balducci2024cluestering,
title={CLUEstering: a high-performance density-based clustering library for scientific computing},
author={Balducci, Simone},
year={2024}
}
Clustering is a computational technique that aims at classifying objects based on their similarity, and is widely used in many branches of science nowadays, for instance in image segmentation, medical imaging, study of complex systems, machine learning techniques and high-energy physics. As the amount of data collected in every field of research increases, techniques like clustering will have to deal with an increasing amount of data, which will keep increasing faster than the rate at which the hardware is evolving. This requires to find new ways to handle this data as efficiently as possible. In the last decades, parallel processors like GPUs and FPGA have risen in popularity, thanks to their ability to perform complex calculations very efficiently by executing a large number of operations in parallel. The purpose of this thesis is to develop a general-purpose clustering library based on the CLUE algorithm [1], a highly parallel density-based clustering algorithm used for the local reconstruction of hits in the high-granularity calorimeters of the CMS detector at CERN. CLUEstering is developed using the Alpaka library, a C++ performance portability library that allows to write code that runs on many types of modern processors with near-native efficiency and without any code duplication. The library is developed with a Python interface to the C++ backend, in order to make it easier to use and appeal to a wider range of users. In the end the library was tested on selected datasets in order to assess the quality of its reconstruction and benchmark its performance. Also, to show its generality it was applied to two modern problems from two separate areas of science: vertex reconstruction in high-energy physics and stars detection from PSF images in astronomy.
December 1, 2024 by hgpu