high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Fast Locality Sensitive Hashing for Beam Search on GPU

Fast Locality Sensitive Hashing for Beam Search on GPU

Xing Shi, Shizhen Xu, Kevin Knight

Department of Computer Science, University of Southern California

arXiv:1806.00588 [cs.CL], (2 Jun 2018)

BibTeX

Download (PDF)

View

Source

1922

views

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to speed up beam search for sequence models. We utilize the winner-take-all (WTA) hash, which is based on relative ranking order of hidden dimensions and thus resilient to perturbations in numerical values. Our algorithm is designed by fully considering the underling architecture of CUDA-enabled GPUs (Algorithm/Architecture Co-design): 1) A parallel Cuckoo hash table is applied for LSH code lookup (guaranteed O(1) lookup time); 2) Candidate lists are shared across beams to maximize the parallelism; 3) Top frequent words are merged into candidate lists to improve performance. Experiments on 4 large-scale neural machine translation models demonstrate that our algorithm can achieve up to 4x speedup on softmax module, and 2x overall speedup without hurting BLEU on GPU.

Tags: Algorithms, Artificial intelligence, Computer science, CUDA, Data Structures and Algorithms, Hashing, NLP, nVidia, Tesla K20

June 9, 2018 by hgpu

Rating: 2.0/5. From 1 vote.

Please wait...

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Fast Locality Sensitive Hashing for Beam Search on GPU

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)

Fast Locality Sensitive Hashing for Beam Search on GPU

Share this:

Recent source codes

Most viewed papers (last 30 days)