Efficient Hash Tables on the GPU
University of California, Davis
University of California, Davis, 134 pages, 2011
@phdthesis{alcantara2012efficient,
title={Efficient Hash Tables on the GPU},
author={Alcantara, D.A.F.},
year={2012},
school={UNIVERSITY OF CALIFORNIA, DAVIS}
}
Advances in GPU architecture have made efficient implementations of hash tables possible, allowing fast parallel constructions and retrievals despite the uncoalesced memory accesses naturally incurred by hashing algorithms. The key is to mitigate the penalty of these accesses by minimizing the number that occur and utilizing the cache (when one is available). Most work done on parallel hashing is ill-equipped for this objective and relies on the theoretical PRAM model, which abstracts away the difficulties of programming on actual hardware. We examine hashing schemes from a practical perspective using NVIDIA’s CUDA architecture. Our main contribution is a set of parallel implementations for open addressing, chaining, and cuckoo hashing. We analyze each method and identify when applications should use one over another. Because each makes different performance trade-offs, we compare them using three metrics: memory usage, construction time, and retrieval efficiency. Retrieval efficiency considers both the average time and deviation from it, since answering some queries can be several orders of magnitude more difficult than others. Our quadratic probing implementation shows this as the hash table becomes more compact: on a GTX 470, using datasets containing 10M random key-value pairs, it has respective rates of [369M, 723M, 539M] pairs per second (pps) for insertion, retrieving every input item, and retrieving 10M keys absent from the table when using 2N space. For 1.05N space, these rates significantly drop to [162M, 208M, 46M] pps, reflecting the difficulty of terminating both insertions and queries. Applications requiring more robust retrieval could benefit from our chaining implementation, which eschews linked lists and uses radix sort for an efficient parallel construction. When using 2N space, the rates are [344M, 436M, 624M] pps, while for 1.05N the rates are [449M, 211M, 126M] pps. For compact tables, its construction rate is almost 3x faster than quadratic probing with a smaller drop in retrieval efficiency for failed queries. However, the number of probes required to answer queries grows drastically for compact tables, leading to poorer retrieval rates. Cuckoo hashing is better suited for these cases, trading a more complicated construction for guaranteed constant time retrievals. It has rates of [366M, 670M, 501M] pps using 2 N space and [133M, 386M, 258M] pps for 1.05N. Our method is also adaptable and can be specialized for situations where multiple values are stored per key.
April 14, 2012 by hgpu