Tuned and GPU-accelerated parallel data mining from comparable corpora

Krzysztof Wolk, Krzysztof Marasek
Department of Multimedia, Polish-Japanese Academy of Information Technology, Koszykowa 86, Warsaw
arXiv:1509.08639 [cs.CL], (29 Sep 2015)


   title={Tuned and GPU-accelerated parallel data mining from comparable corpora},

   author={Wolk, Krzysztof and Marasek, Krzysztof},






Download Download (PDF)   View View   Source Source   



The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such has a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to Yalign mining methodology by reimplementing the comparison algorithm, introducing a tuning scripts and by improving performance using GPU computing acceleration. The experiments are conducted on various text domains and bi-data is extracted from the Wikipedia dumps.
No votes yet.
Please wait...

You must be logged in to post a comment.

* * *

* * *

HGPU group © 2010-2021 hgpu.org

All rights belong to the respective authors

Contact us: