high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient Hash Tables on the GPU

Efficient Hash Tables on the GPU

Dan Anthony Feliciano Alcantara

University of California, Davis

University of California, Davis, 134 pages, 2011

@phdthesis{alcantara2012efficient,

title={Efficient Hash Tables on the GPU},

author={Alcantara, D.A.F.},

year={2012},

school={UNIVERSITY OF CALIFORNIA, DAVIS}

}

Download (PDF)

View

Source

4514

views

Advances in GPU architecture have made efficient implementations of hash tables possible, allowing fast parallel constructions and retrievals despite the uncoalesced memory accesses naturally incurred by hashing algorithms. The key is to mitigate the penalty of these accesses by minimizing the number that occur and utilizing the cache (when one is available). Most work done on parallel hashing is ill-equipped for this objective and relies on the theoretical PRAM model, which abstracts away the difficulties of programming on actual hardware. We examine hashing schemes from a practical perspective using NVIDIA’s CUDA architecture. Our main contribution is a set of parallel implementations for open addressing, chaining, and cuckoo hashing. We analyze each method and identify when applications should use one over another. Because each makes different performance trade-offs, we compare them using three metrics: memory usage, construction time, and retrieval efficiency. Retrieval efficiency considers both the average time and deviation from it, since answering some queries can be several orders of magnitude more difficult than others. Our quadratic probing implementation shows this as the hash table becomes more compact: on a GTX 470, using datasets containing 10M random key-value pairs, it has respective rates of [369M, 723M, 539M] pairs per second (pps) for insertion, retrieving every input item, and retrieving 10M keys absent from the table when using 2N space. For 1.05N space, these rates significantly drop to [162M, 208M, 46M] pps, reflecting the difficulty of terminating both insertions and queries. Applications requiring more robust retrieval could benefit from our chaining implementation, which eschews linked lists and uses radix sort for an efficient parallel construction. When using 2N space, the rates are [344M, 436M, 624M] pps, while for 1.05N the rates are [449M, 211M, 126M] pps. For compact tables, its construction rate is almost 3x faster than quadratic probing with a smaller drop in retrieval efficiency for failed queries. However, the number of probes required to answer queries grows drastically for compact tables, leading to poorer retrieval rates. Cuckoo hashing is better suited for these cases, trading a more complicated construction for guaranteed constant time retrievals. It has rates of [366M, 670M, 501M] pps using 2 N space and [133M, 386M, 258M] pps for 1.05N. Our method is also adaptable and can be specialized for situations where multiple values are stored per key.

Tags: Algorithms, Computer science, CUDA, Hashing, nVidia, nVidia GeForce GTX 280, nVidia GeForce GTX 470, Thesis

April 14, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

high performance computing on graphics processing units: hgpu.org

Efficient Hash Tables on the GPU

Your response

Recent source codes

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

RepoLaunch: Automating Build and Test Pipeline of Code Repositories on ANY Language and ANY Platform

CONCUR: a benchmark designed to evaluate multithreaded Java code generated by LLMs

HIPRT: Ray Tracing using HIP

MXFP4 Training Support Codebase

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Most viewed papers (last 30 days)

Efficient Hash Tables on the GPU

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)