high performance computing on graphics processing units: hgpu.org

hgpu.org » Applications » Computer science » CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU

CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU

Shenshen Liang, Cheng Wang, Ying Liu, Liheng Jian

Graduate University of Chinese Academy of Sciences, Beijing, China 100190

IEEE Youth Conference on Information, Computing and Telecommunication, 2009. YC-ICT ’09

DOI:10.1109/YCICT.2009.5382329

BibTeX

Download (PDF)

View

Source

2466

views

Recent development in Graphics Processing Units (GPUs) has enabled inexpensive high performance computing for general-purpose applications. Due to GPU’s tremendous computing capability, it has emerged as the co-processor of the CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. K-nearest neighbor is a widely used classification technique and has significant applications in various domains. The computational-intensive nature of KNN requires a high performance implementation. In this paper, we present a CUDA-based parallel implementation of KNN, CUKNN, using CUDA multi-thread model. Various CUDA optimization techniques are applied to maximize the utilization of the GPU. CUKNN outperforms significantly and achieve up to 15.2X speedup. It also shows good scalability when varying the dimension of the training dataset and the number of records in training dataset.

Tags: Computer science, CUDA, Nearest neighbour, nVidia, Optimization, Tesla C1060

August 17, 2011 by hgpu

No votes yet.

Please wait...

high performance computing on graphics processing units: hgpu.org

CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)

CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)