Parallel Computations for Hierarchical Agglomerative Clustering using CUDA

hgpu.org » Programming » Algorithms » Parallel Computations for Hierarchical Agglomerative Clustering using CUDA

Parallel Computations for Hierarchical Agglomerative Clustering using CUDA

S.A. Arul Shalom, Manoranjan Dash

School of Computer Engineering, Nanyang Technological University

PCC Vol. 3 Iss. 1, PP. 1-11, 2013

BibTeX

Download (PDF)

View

Source

2013

views

Graphics Processing Units (GPU) in today’s desktops can well be thought of as a high performance parallel processor. Traditionally, parallel computing is the usage of multiple computing resources to execute computational problems simultaneously. Such computations are possible using multi-core CPUs or computers with multiple CPUs or by using a network of computers in parallel. Today’s GPUs are capable of simultaneously using multiple internal computing resources such as ‘core-processors’ or ‘multi-processors’ to compute within a fraction of the time a CPU would need. We explore the parallel architecture of GPU for cost-effective desktop parallel computing of a core data mining problem such as clustering, which could then be applied to parallelize other data mining computations. The launch of NVIDIA’s Compute Unified Device Architecture (CUDA) technology has been a catalyst to the phenomenal growth of the application of GPUs to parallelize various scientific and data mining related computations. With CUDA the skills and techniques needed in invoking the internal parallel processors of a GPU is viable to scientific researchers who might not be expert graphics programmers. We embark on the application of CUDA based programming to parallelize the traditional Hierarchical Agglomerative Clustering (HAC) algorithm and demonstrate speed gains over the CPU. Speed gains from 15 times up to about 90 times have been realized for various clustering conditions. The effects of CUDA blocks and challenges involved in invoking graphical hardware for such data mining algorithms are discussed. It is interesting to note that a block size of 8 is optimal for GPU with 128 internal processors. We further discuss the research issues that arise with parallelizing HAC on GPU with CUDA and propose the use of GPU as an efficient desktop processor. Results show that the future of extensively utilizing desktop computers for parallel computing based on GPUs is promising.

Tags: Algorithms, Clustering, Computer science, CUDA, Data mining, nVidia, nVidia GeForce 8800 GTS

July 26, 2014 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

chemtrain-deploy: A parallel and scalable framework for machine learning potentials in million-atom MD simulations

microSYCL: SYCL micro-benchmarks repository

Exploring SYCL as a Portability Layer for High-Performance Computing on CPUs

See all packages

* * *

high performance computing on graphics processing units: hgpu.org