https://hgpu.org/?p=5151
CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU