high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU

Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU

Shenshen Liang, Ying Liu, Cheng Wang, Liheng Jian

Chinese Acad. of Sci., Grad. Univ., Beijing, China

IEEE 2nd Symposium on Web Society (SWS), 2010

DOI:10.1109/SWS.2010.5607480

BibTeX

Source

1630

views

Recent developments in Graphics Processing Units (GPUs) have enabled inexpensive high performance computing for general-purpose applications. Due to GPU’s tremendous computing capability, it has emerged as the co-processor of CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. K-nearest neighbor (KNN) is a widely used classification technique and has significant applications in various domains, especially in text classification. The computational-intensive nature of KNN requires a high performance implementation. In this paper, we propose CUKNN, a CUDA-based parallel implementation of KNN. It launches two CUDA kernels, distance calculation kernel and selecting kernel. In the distance calculation kernel, a great number of concurrent CUDA threads are issued, where each thread performs the calculation between the query object and a reference object; in the selecting kernel, threads in a block find the local-k nearest neighbors of the query object concurrently, and then a thread is invoked to find the global-k nearest neighbors out of the queues of local-k neighbors. Various CUDA optimization techniques are applied to maximize the utilization of GPU. We evaluate our implementation by using synthetic dataseis and a real physical simulation dataset. The experimental results demonstrate that CUKNN outperforms the serial KNN on an HP xw8600 workstation significantly, achieving up to 46.7IX speedup on the synthetic dataseis and 42.49X on the physical simulation dataset including I/O cost. It also shows good scalability when varying the number of dimensions of the reference dataset, the number of objects in the reference dataset, and the number of objects in the query dataset.

Tags: Computer science, CUDA, Nearest neighbour, nVidia

May 4, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)