high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Tuned and GPU-accelerated parallel data mining from comparable corpora

Tuned and GPU-accelerated parallel data mining from comparable corpora

Krzysztof Wolk, Krzysztof Marasek

Department of Multimedia, Polish-Japanese Academy of Information Technology, Koszykowa 86, Warsaw

arXiv:1509.08639 [cs.CL], (29 Sep 2015)

@article{wolk2015tuned,

title={Tuned and GPU-accelerated parallel data mining from comparable corpora},

author={Wolk, Krzysztof and Marasek, Krzysztof},

year={2015},

month={sep},

archivePrefix={"arXiv"},

primaryClass={cs.CL}

}

Download (PDF)

View

Source

2297

views

The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such has a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to Yalign mining methodology by reimplementing the comparison algorithm, introducing a tuning scripts and by improving performance using GPU computing acceleration. The experiments are conducted on various text domains and bi-data is extracted from the Wikipedia dumps.

Tags: Algorithms, Computer science, Data mining, Machine learning, NLP, nVidia, nVidia GeForce GTX 660

October 3, 2015 by hgpu

No votes yet.

Please wait...

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Tuned and GPU-accelerated parallel data mining from comparable corpora

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)

Tuned and GPU-accelerated parallel data mining from comparable corpora

Share this:

Recent source codes

Most viewed papers (last 30 days)