So traditionally, regular expressions have been… err… regular. While regular expressions are incredibly useful, being regular is a limitation. The canonical example of what a regular language cannot do, is counting parentheses. For example, if I am trying to match an expression with arbitrary nesting, I need a more robust matching solution than a regular language can provide.
I have seen that CommonLisp – Portable Perl-compatible regular expressions (cl-ppcre) supported a parse tree representation of regular expressions. My hope was that this would allow arbitrary embedding of CL code into the matching structure of the regular expressions. It does.
To teach myself how to use this tool, I decided to count parentheses. The basic structure of the parse tree is just a list of symbols that get compiled into specific matching functions.
- ‘.’ -> :EVERYTHING
- ‘*’ -> :GREEDY-REPITITION
To embed a lisp function you create a node in that parse-tree named :FILTER whose child is a function of one argument (the position in the string this function should try to match). This function returns how many charcters it consumes (by position), or nil if it did not match.
Keep in mind that the following sample is not reentrant, but could be made to be so. Now without further ado, the code:
link to the paste