String Algorithm on GPGPU

Ong Wen Mei
Faculty of Engineering, Multimedia University
Multimedia University, 2013



   author={MEI, ONGWEN},



Download Download (PDF)   View View   Source Source   



Since the last decade, the concept of general purpose computing on graphics processors was introduced and has since garnered significant adaptation in the engineering industry. The use of a Graphics Processing Unit (GPU) as a many-core processing architecture for the purpose of general-purpose computation yields performance improvement of several orders-of magnitude. One example in leveraging a GPU for improved computational performance is the parallelization of string searching algorithms. The string search algorithm is often required to be performed on large datasets. However, string algorithms are commonly optimised to be performed using a Central Processing Unit (CPU) and risks high levels of computational bottlenecks. As such, this dissertation first identifies these bottlenecks by analysing the design and implementation of a serial Bloom Filter algorithm. Based on the analysed results, a parallel Bloom Filter algorithm is proposed, which utilizes software threads (i.e., OpenMP) on a multi-core processor for data parallelism. Experimental results suggest that a multi-core driven (i.e., Using 8 logical processors) parallel Bloom Filter algorithm exhibits performance speed up of up to 3.3x for a set of 10 million strings. The design and implementation of the parallel Bloom Filter algorithm is then extended into a GPU processor using the Compute Unified Device Architecture (CUDA) parallel computing platform. The proposed parallel Bloom Filter algorithm on CUDA segments the string list into blocks of words and threads in generating the bit table, which is used during the string matching process. This method maximizes the computational performance and sustains consistent string matching results. Experimental results suggest that a GPU driven (i.e., Using 256 CUDA cores) parallel Bloom Filter algorithm further extends the performance speed up to 5.5x for a set of 10 million strings. In addition, this dissertation also analyses the searching drawback of a Bloom Filter algorithm in locating matching string positions. As such, an alternative parallel Quick Search string algorithm is designed, implemented and assessed in an effort to address this drawback.
No votes yet.
Please wait...

* * *

* * *

HGPU group © 2010-2017 hgpu.org

All rights belong to the respective authors

Contact us: