Regular Expression Matching and Operational Semantics
University of Birmingham, Birmingham B15 2TT, United Kingdom
arXiv:1108.3126v1 [cs.LO] (16 Aug 2011)
@article{2011arXiv1108.3126R,
author={Rathnayake}, A. and {Thielecke}, H.},
title={"{Regular Expression Matching and Operational Semantics}"},
journal={ArXiv e-prints},
archivePrefix={"arXiv"},
eprint={1108.3126},
primaryClass={"cs.LO"},
keywords={Computer Science – Logic in Computer Science},
year={2011},
month={aug},
adsurl={http://adsabs.harvard.edu/abs/2011arXiv1108.3126R},
adsnote={Provided by the SAO/NASA Astrophysics Data System}
}
Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match the regular expression on the fly. Thus they can be seen as virtual machines interpreting the regular expression much as if it were a program with some non-deterministic constructs such as the Kleene star. We formalize this implementation technique for regular expression matching using operational semantics. Specifically, we derive a series of abstract machines, moving from the abstract definition of matching to increasingly realistic machines. First a continuation is added to the operational semantics to describe what remains to be matched after the current expression. Next, we represent the expression as a data structure using pointers, which enables redundant searches to be eliminated via testing for pointer equality. From there, we arrive both at Thompson’s lockstep construction and a machine that performs some operations in parallel, suitable for implementation on a large number of cores, such as a GPU. We formalize the parallel machine using process algebra and report some preliminary experiments with an implementation on a graphics processor using CUDA.
August 17, 2011 by hgpu