high performance computing on graphics processing units: hgpu.org

hgpu.org » Programming » Algorithms » Efficient Parallelization of Natural Language Applications using GPUs

Efficient Parallelization of Natural Language Applications using GPUs

Chao-Yue Lai

Electrical Engineering and Computer Sciences, University of California at Berkeley

EECS Department, University of California, Berkeley, Technical Report No. UCB/EECS-2012-54, 2012

@mastersthesis{Lai:EECS-2012-54,

Author={Lai, Chao-Yue},

Title={Efficient Parallelization of Natural Language Applications using GPUs},

School={EECS Department, University of California, Berkeley},

Year={2012},

Month={May},

URL={http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-54.html},

Number={UCB/EECS-2012-54}

}

Download (PDF)

View

Source

2102

views

As we enter the era of mobile computing, high-quality and efficient natural language applications become more and more important, which greatly facilitate intelligent human-computer interaction. Unfortunately, most high-quality natural language applications employ large statistical models, which render them impractical for real-time use. Meanwhile, Graphics Processor Units (GPUs) have become widely available, offering the opportunity to alleviate this bottleneck by exploiting the fine-grained data parallelism found in the natural language processing algorithms. In this report, we examine the possibility of parallelizing two major natural language applications, natural language parsing and machine translation on GPUs. In natural language parsing, we explore the design space of parallelizing the dynamic programming computations carried out by the CKY parsing algorithm. We use the Compute Unified Device Architecture (CUDA) programming model to re-implement a state-of-the-art parser, and compare its performance on two recent GPUs with different architectural features. Our best results show a 26-fold speedup compared against an optimized sequential C implementation. In machine translation, we focus on parallelizing the CKY-based machine translation decoding algorithm using a phrase-based translation model and a trigram language model. Various optimization approaches exposing the inherent massive parallelism and reducing memory accesses have been investigated. Experimental results show that our best parallel implementation runs twice as fast as the optimized sequential implementation, without loss of accuracy. A runtime analysis shows that this suboptimal performance is caused by the memory-intensive nature and excessive amount of irregular memory accesses inherent in CKY-based machine translation decoding.

Tags: Algorithms, Computer science, CUDA, Data parallelism, nVidia, nVidia GeForce GTX 285, nVidia GeForce GTX 480, Optimization, Thesis

May 11, 2012 by hgpu

No votes yet.

Please wait...

Your response

You must be logged in to post a comment.

* * *

high performance computing on graphics processing units: hgpu.org

Efficient Parallelization of Natural Language Applications using GPUs

Your response

Recent source codes

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

pplx-garden: Perplexity open source garden for inference technology

LC Framework

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

Most viewed papers (last 30 days)

Efficient Parallelization of Natural Language Applications using GPUs

Share this:

Your response

Recent source codes

Most viewed papers (last 30 days)