high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Atul Bagga, Durga Toshniwal

Department of Electronics and Computer Engineering Indian Institute of Technology Roorkee, Roorkee-247667, India

International Journal of Computer Science and Electrical Engineering (IJCSEE), Vol. 1, 2012

@article{bagga2012parallelization,

title={Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture},

author={Bagga, A. and Toshniwal, D.},

year={2012}

}

Download (PDF)

View

Source

2593

views

Text Clustering is the problem of dividing text documents into groups, such that documents in same group are similar to one another and different from documents in other groups. Because of the general tendency of texts forming hierarchies, text clustering is best performed by using a hierarchical clustering method. An important aspect while clustering large text databases is that of high dimensionality of the representation space. Not only does it take lot of space in storing hierarchy trees but also a lot of time is spent in similarity calculations while clustering these documents. In this paper we propose to parallelize a method which uses a tree based summarization technique to store cluster summaries in a tree stored in the memory at all times of processing. The results show that our method shows good accuracy along with a good speed up in calculating clusters.

Tags: Clustering, Computer science, CUDA, Databases, Hierarchical clustering, nVidia, nVidia GeForce 8800 GTX

September 21, 2012 by hgpu

Rating: 5.0/5. From 1 vote.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Your response

Recent source codes

True 4-Bit Quantized CNN Training on CPU

cuFuzz: A GPU-oriented coverage-guided fuzzer for userland CUDA application

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

MSKernelBench & CUDAMaster

EvoScientist: Harness Vibe Research with Self-evolving AI Scientists

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

CONCUR: a benchmark designed to evaluate multithreaded Java code generated by LLMs

HIPRT: Ray Tracing using HIP

MXFP4 Training Support Codebase

Most viewed papers (last 30 days)

Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)