So traditionally, regular expressions have been… err… regular. While regular expressions are incredibly useful, being regular is a limitation. The canonical example of what a regular language cannot do, is counting parentheses. For example, if I am trying to match an expression with arbitrary nesting, I need a more robust matching solution than a regular language can provide.

I have seen that CommonLisp – Portable Perl-compatible regular expressions (cl-ppcre) supported a parse tree representation of regular expressions. My hope was that this would allow arbitrary embedding of CL code into the matching structure of the regular expressions. It does.

To teach myself how to use this tool, I decided to count parentheses. The basic structure of the parse tree is just a list of symbols that get compiled into specific matching functions.

- ‘.’ -> :EVERYTHING
- ‘*’ -> :GREEDY-REPITITION

To embed a lisp function you create a node in that parse-tree named :FILTER whose child is a function of one argument (the position in the string this function should try to match). This function returns how many charcters it consumes (by position), or nil if it did not match.

Keep in mind that the following sample is not reentrant, but could be made to be so. Now without further ado, the code:

link to the paste

You made a stack machine!

By “counting parentheses”, do you mean “matching parentheses”?

Interesting observation; “match” in the context of regular expressionish things is a bit overloaded (normal regular expressions can “match” arbitrarily many parentheses characters). “Matching pairs of nested parentheses” might be closer, but really I thinking counting the number of parentheses is closer to what we are actually doing. We count all the open parentheses and keep matching till we get the same number of closing parentheses. English fails where algorithms speak I suppose.

Pingback: Even More Fun With CL-PPCRE Filter Functions | Russ’s Tech Blog