high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel algorithms for problems of cluster analysis with very large amount of data

Parallel algorithms for problems of cluster analysis with very large amount of data

Natalya Litvinenko

Al-Farabi Kazakh National University, Almaty, Kazakhstan

arXiv:1402.3789 [cs.DC], (16 Feb 2014)

@article{2014arXiv1402.3789L,

author={Litvinenko}, N.},

title={"{Parallel algorithms for problems of cluster analysis with very large amount of data}"},

journal={ArXiv e-prints},

archivePrefix={"arXiv"},

eprint={1402.3789},

primaryClass={"cs.DC"},

keywords={Computer Science – Distributed, Parallel, and Cluster Computing, 91C20, 68W10, 62-07, D.1.3, G.1.0, G.4, H.3.3, I.5.3},

year={2014},

month={feb},

adsurl={http://adsabs.harvard.edu/abs/2014arXiv1402.3789L},

adsnote={Provided by the SAO/NASA Astrophysics Data System}

}

Download (PDF)

View

Source

1674

views

In this paper we solve on GPUs massive problems with large amount of data, which are not appropriate for solution with the SIMD technology. For the given problem we consider a three-level parallelization. The multithreading of CPU is used at the top level and graphic processors for massive computing. For solving problems of cluster analysis on GPUs the nearest neighbor method (NNM) is developed. This algorithm allows us to handle up to 2 millions records with number of features up to 25. Since sequential and parallel algorithms are fundamentally different, it is difficult to compare the computation times. However, some comparisons are made. The gain in the computing time is about 10 times. We plan to increase this factor up to 50-100 after fine tuning of algorithms.

Tags: Algorithms, Cluster analysis, Computer science, CUDA, Nearest neighbour, nVidia, nVidia GeForce GTX 660

February 19, 2014 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Parallel algorithms for problems of cluster analysis with very large amount of data

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Parallel algorithms for problems of cluster analysis with very large amount of data

Share this:

Recent source codes

Most viewed papers (last 30 days)