Generic Inverted Index on the GPU
School of Computing, National University of Singapore
arXiv:1603.08390 [cs.DB], (28 Mar 2016)
@article{zhou2016generic,
title={Generic Inverted Index on the GPU},
author={Zhou, Jingbo and Guo, Qi and Jagadish, H. V. and Luan, Wenhao and Tung, Anthony K. H. and Yang, Yueji and Zheng, Yuxin},
year={2016},
month={mar},
archivePrefix={"arXiv"},
primaryClass={cs.DB}
}
Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity of customized indexes, it is hard to devise a general parallelization strategy to speed up the search. In this paper, we propose a generic inverted index on the GPU (called GENIE), which can support similarity search of multiple queries on various data types. GENIE can effectively support the approximate nearest neighbor search in different similarity measures through exerting Locality Sensitive Hashing schemes, as well as similarity search on original data such as short document data and relational data. Extensive experiments on different real-life datasets demonstrate the efficiency and effectiveness of our system.
March 29, 2016 by hgpu