Log File Regular Expression Pattern Matching And Capture With GPUs
Universidad Autónoma de Madrid
Universidad Autonoma de Madrid, 2016
Download (PDF) View Source Source codes
The information contained in a system is normally stored into log files. Most of the time, these files store the information in plain text with many not formatted information. It is then necessary to extract parts of this information to be able to understand what is going on such system. Currently, such information can be extracted using programs that make use of extended regular expressions. The use of regular expressions allows the search of patterns but it can be also used to extract data from the searched pattern. Most of the programs that implement regular expressions are based on finite automatas, such as non-deterministic (NFA) or deterministic (DFA). We aim to explore the use of finite automatas to extract data from log files using a Graphic Processor Unit (GPU) device to speedup the process. Moreover, we will also explore data parallelism over the lines present on the log file. Currently, the work done in GPU with regular expressions is limited to matching tasks only, without any capture feature. We present a solution that solves this lack of pattern capture in current implementations. Our development uses as base the implementation of TNFA and converts it to a TDFA before running the GPU task. We explore the new CUDA feature named unified memory, supported since CUDA 6, together with streams to achieve the best possible performance in our GPU implementation. Using real log files and regular expressions made to extract specific data, our evaluation shows that it can be up to 9 times faster than the sequential implementation.
December 22, 2016 by xhgpu