high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Parallel data mining algorithms for multi-dimensional points on GPUs

Parallel data mining algorithms for multi-dimensional points on GPUs

Takazumi Matsumoto

The Hong Kong Polytechnic University

The Hong Kong Polytechnic University, 2015

@phdthesis{matsumoto2015parallel,

title={Parallel data mining algorithms for multi-dimensional points on GPUs},

author={Matsumoto, Takazumi},

year={2015},

school={The Hong Kong Polytechnic University}

}

Download (PDF)

View

Source

2330

views

Data mining tasks such as clustering, outlier detection and similarity search typically employ a series of algorithms to operate on a large set of data, making them amenable to parallelization. Thus parallelization of data mining operations such as distance computation has been extensively studied in the literature. In recent years, the use of Graphics Processing Units (GPUs) for data mining has been steadily increasing. As state-of-the-art processors now include both CPU and GPU, it is important to consider which approaches benefit from GPU processing and which do not, and apply a heterogeneous processing approach to improve the efficiency when applicable. Similarity search, also referred to as k-nearest neighbor search, is a particular application of distance computation where only to top k results (e.g., top 10) are required. It is used extensively in multimedia search, where only a small subset of possible results are used, and numerous approaches using both exact and approximate k-nearest neighbor methods have been proposed over the years. Our contribution is a new exact distance algorithm that not only outperforms competing exact GPU methods, but can compete with leading approximate GPU methods. It can also leverage heterogeneous (CPU-GPU) processing for improved efficiency. Outlier detection, also known as anomaly detection, involves detecting data points that deviate from expected patterns. It is an important part of applications such as fraud detection and system fault detection. In many real-world applications such as wireless sensor readings and weather forecasts, data are uncertain in nature. Rather than discarding uncertainty information or storing many sample readings, it is possible to instead approximate the probability distribution function (pdf) to compactly store and efficiently calculate uncertain distance when required. Existing GPU approaches are limited to working with certain data, while our contribution is a parallel method for outlier detection on uncertain data. Clustering is another common operation in data mining. Our work focuses on a particular application, label generation for web search results that automatically generates labels for related topics returned by a user’s web search. This is accomplished using a fuzzy transduction-based relational model. We also contribute a GPU implementation of our proposed solution.

Tags: Algorithms, Clustering, Computer science, Data mining, Heterogeneous systems, Nearest neighbour, nVidia, nVidia GeForce GTX 780, OpenCL, Thesis

April 16, 2016 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Parallel data mining algorithms for multi-dimensional points on GPUs

Your response

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)

Parallel data mining algorithms for multi-dimensional points on GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)