12550

Parallel Computations for Hierarchical Agglomerative Clustering using CUDA

S.A. Arul Shalom, Manoranjan Dash
School of Computer Engineering, Nanyang Technological University
PCC Vol. 3 Iss. 1, PP. 1-11, 2013

@article{shalom2013parallel,

   title={Parallel Computations for Hierarchical Agglomerative Clustering using CUDA},

   author={Shalom, SA Arul and Dash, Manoranjan},

   year={2013}

}

Download Download (PDF)   View View   Source Source   

641

views

Graphics Processing Units (GPU) in today’s desktops can well be thought of as a high performance parallel processor. Traditionally, parallel computing is the usage of multiple computing resources to execute computational problems simultaneously. Such computations are possible using multi-core CPUs or computers with multiple CPUs or by using a network of computers in parallel. Today’s GPUs are capable of simultaneously using multiple internal computing resources such as ‘core-processors’ or ‘multi-processors’ to compute within a fraction of the time a CPU would need. We explore the parallel architecture of GPU for cost-effective desktop parallel computing of a core data mining problem such as clustering, which could then be applied to parallelize other data mining computations. The launch of NVIDIA’s Compute Unified Device Architecture (CUDA) technology has been a catalyst to the phenomenal growth of the application of GPUs to parallelize various scientific and data mining related computations. With CUDA the skills and techniques needed in invoking the internal parallel processors of a GPU is viable to scientific researchers who might not be expert graphics programmers. We embark on the application of CUDA based programming to parallelize the traditional Hierarchical Agglomerative Clustering (HAC) algorithm and demonstrate speed gains over the CPU. Speed gains from 15 times up to about 90 times have been realized for various clustering conditions. The effects of CUDA blocks and challenges involved in invoking graphical hardware for such data mining algorithms are discussed. It is interesting to note that a block size of 8 is optimal for GPU with 128 internal processors. We further discuss the research issues that arise with parallelizing HAC on GPU with CUDA and propose the use of GPU as an efficient desktop processor. Results show that the future of extensively utilizing desktop computers for parallel computing based on GPUs is promising.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: