A GPU-Based Accelerator for Chinese Word Segmentation
Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, P.R. China
The 14th Asia-Pacific Web Conference (APWeb), 2012
@article{gu2012gpu,
title={A GPU-Based Accelerator for Chinese Word Segmentation},
author={Gu, X. and Li, R. and Wen, K. and Peng, B. and Xiao, W.},
year={2012}
}
The task of Chinese word segmentation is to split sequence of Chinese characters into tokens so that the Chinese information can be more easily retrieved by web search engine. Due to the dramatic increase in the amount of Chinese literature in recent years, it becomes a big challenge for web search engines to analyze massive Chinese information in time. In this paper, we investigate a new approach to high-performance Chinese information processing. We propose a CPU-GPU collaboration model for Chinese word segmentation. In our novel model, a dictionary-based word segmentation approach is proposed to fit GPU architecture. Three basic word segmentation algorithms are applied to evaluate the performance of this model. In addition, we present several optimization strategies to fully exploit the potential computing power of GPU. Our experimental results show that our model can achieve significant performance speedups up to 3-fold compared with the implementations on CPU.
April 9, 2012 by hgpu