I think I have enough of the bugs worked out and enough tests now to actually recommend that others use the recursive-regex library up at my github.
There is a brief overview of the project in the introductory blog post. Also, in response to one of the comments on that blog post, there is an example s-exp parser as part of the test suite.
While this started as a toy, to scratch an intellectual itch, I think that this project is potentially a nice mid point between full blown parser frame work and regular expressions. Grammars are hard to get right though, so if you are writing your own language you might want to investigate something from the cliki parser generators page (eg: cl-yacc).
Recursive-Regex is the end result of a weekend of playing with the code I published on Thursday about adding named dispatch functions to CL-PPCRE regular expressions. I kept at it and I think that this approach might have some promise for building up a library of reusable regexp/matcher chunks. I also found that this made it somewhat easier to obtain results from the regular expression search because I get back a full parse tree rather than the bindings typically supplied by CL-PPCRE.
I have it somewhat documented, loadable and testable, with all my current tests passing. There is even a recursive regex csv parser defined in the default dispatch table (mostly as a simple, but practical proof of concept).
Comma-List: [\t ]*(?:(?<body>[^,]*)[\t ]*,)*[\t ]*(?<body>[^,]*)[\t ]*
Double quotes and body both go to custom dispatcher functions. Body defines where the body regex should be matched and what to use if no body is supplied.
I don’t really have long term plans for this project, but it scratched an intellectual itch I was experiencing. Perhaps it will be useful for someone down the road.
A while ago I posted about my adventures playing with CL-PPCRE filter functions. In the previous blog post I destructively modify a cl-ppcre parse tree to add a filter function that can handle matching matched pairs of parentheses (a typical example of what regular expressions are NOT capable of). In this post I formalize that example into something that could be more broadly applied with less understanding of the underlying mechanics.
To begin with I define a function create-scanner-with-filters that will handle creating these special scanners for me. My idea is to provide a table of functions that should be called when we see certain strings inside of the regular expression. Because there are already named groups (see *allow-named-registers*) that can have parameters and that CL-PPCRE is already parsing for me, I decided to tie into the named registers to handle my function dispatching. This has the added niceness that whatever your filter matches is going to be stored in a register.
An over view of this process is: parse the regex, replace any named-register nodes’ (that have a function in the table) third element (usually a regex whose match will be stored in a register) with our specialized filter function, compile the new scanner and return that to the end user. I also decided that the regex that is the body of the named group should be available to the filter and in most cases should probably be used as part of the filter function.
If I continue to play with this, I might eventually release it as a library, but for now its stands well on its own.
Without further ado:
(declaim (optimize (debug 3)))
;; TODO: group binds in body expressions
;; TODO: propogate current scanner options to body scanners
(defun make-matched-pair-matcher (open-char close-char)
"Will create a regex filter that can match arbitrary pairs of matched characters
such as (start (other () some) end)"
(setf body-regex (if (eql body-regex :void)
`(:SEQUENCE :START-ANCHOR ,body-regex :END-ANCHOR))))
;;(format T "TEST3 ~A ~A ~%" cl-ppcre::*reg-starts* cl-ppcre::*reg-ends*)
(with fail = nil)
(with start = pos)
(with cnt = 0)
(for c = (char cl-ppcre::*string* pos))
(unless (eql c open-char) (return fail))
;; went past the string without matching
(when (>= pos (length cl-ppcre::*string*))
((eql c open-char) (incf cnt))
((eql c close-char)
(when (zerop cnt) ;; found our last matching char
(if (or (null body-regex)
(cl-ppcre:scan body-regex cl-ppcre::*string*
:start (+ 1 start)
(return (+ 1 pos))
(defun default-dispatch-table ()
"Creates a default dispatch table with a parens dispatcher that can match
pairs of parentheses"
`(("parens" . ,(make-matched-pair-matcher #\( #\) ))))
(regex &optional (function-table (default-dispatch-table)) )
"Allows named registers to refer to functions that should be in
the place of the named register"
(let* ((cl-ppcre:*allow-named-registers* T)
(p-tree (cl-ppcre:parse-string regex)))
(labels ((dispatcher? (name)
"Return the name of the dispatcher from the table if
(cdr (assoc name function-table :test #'string-equal)))
"Changes the scanner parse tree to include any filter
functions specified in the table"
(aif (and (eql :named-register (first tree))
(dispatcher? (second tree)))
`(:named-register (second tree)
(:filter ,(funcall it (third tree))))
(iter (for item in tree)
(collect (mutate-tree item))))))))
;; mutate the regex to contain our matcher functions
;; then compile it
(cl-ppcre:create-scanner (mutate-tree p-tree)))))
"some times I like to \"function (calling all coppers (), another param (), test)\" just to see what happens")
(defun run-examples ()
"Just runs some examples expected results:
((\"function (calling all coppers (), another param (), test)\"
#(\"(calling all coppers (), another param (), test)\"))
(\"function (calling all coppers (), another param (), test)\"
#(\"(calling all coppers (), another param (), test)\"))
(flet ((doit (regex)
PS. I don’t claim this is actually worth anything, only that I had fun doing it.
I have quite a few database driven web applications that make heavy use of tabular imports and exports (from their primary database, other databases, and exterior data sources (eg: CSVs). This data structure provides column, row, and cell access to getting and setting values, as well as providing functionality to create composite data-tables by retrieving and combining subsections of existing data-tables. This library also aims to ease type coercion from strings to common-lisp types.
I had many scattered, not well tested, not easily runnable pieces of CSV code. I was unhappy with this situation, then decided to refactor all of this into a single project. I wrote tests for it and had a library so I thought I might release it. This project started as extensions and bugfixes on arnesi’s CSV.
I then looked around and saw there are other CSV libraries out there that probably mostly accomplished what I had set out to do. However, I already had my code that was tested, had an easier license (BSD), and provided a framework to interact with my other libraries and systems, so I figured why not just release it anyway.
The only interesting code in this library (to me) is that I managed to make the read/write-csv functions accept a string, pathname, or stream as the first argument and I managed to make sure that streams get closed if these functions created them (file streams for example), but not if the stream was passed in. Nothing great, but I had fun writing it.
Other niceties I would like to continue to build out in this library is its integration with other related libs (like CLSQL). I have code to handle exporting database queries as CSVs as well as code to handle importing CSVs into databases both serially and in bulk. I also use data-tables to have a lisp representation of the just parsed data-table and to coerce that table of string values into relevant common-lisp types.
We use SBCL as our primary Common Lisp Implementation. It is a great runtime, but there is always room for improvement. Nikodemus Siivola is currently fundraising for threading improvements. If you love free, awesome common lisp implementations, please support this project.
A commonly experienced error when using CLSQL in a web environment is database connections conflicting with each other from simultaneous web requests. These problems arise because, by default, clsql standard-db-objects keep a reference to the connection they were queried / created from and reuse this database connection (rather than a new one you may have provided with clsql-sys:with-database). This means that two separate threads could try to use the same database connection (provided through clsql-sys:with-database or by having objects queried from the same connection accessed in multiple threads / http requests).
We solved this problem by introducing a clsql-sys::choose-database-for-instance method (available in clsql master branch from http://git.b9.com/clsql.git. (This branch will eventually be released as CLSQL6) Then in our web applications we define the following class and method override. Usually I then pass this name to clsql-orm or as a direct superclass to any of my web def-view-classes. After this, I just use with-database to establish dynamic connection bindings and everything pretty much works out (as these dynamic bindings are not shared across threads).
(defclass clsql::web-db-obj (clsql-sys:standard-db-object)
((object clsql::web-db-obj) &optional database)
(or database clsql-sys:*default-database*))
(clsql-sys:def-view-class table-1 (clsql::web-db-obj)
:classes '(users employees salaries)
Note: CLSQL-Fluid seems to be trying to accomplish much the same goals.
Collectors is a common lisp library to help accumulate values, that I just pushed to my github account.
Sometimes you just want to collect a list of things. Actually I need to do this all the time. Usually I end up iterating over something in which case Iterate‘s collecting/appending/unioning clauses serve me well. Sometimes though, that just is not a good fit, or I need to accumulate in places iterate deems unacceptable. In these cases it is nice to have specific collector macros. These setup an environment where a function is available that when called with an argument collects it, or when called without arguments returns the results of the collection.
These macros started as a piece of arnesi, but have been modified and added to. It is also nice to be able to include a library for a specific functionality set rather than a bag of semi-related, useful things.
CL-Inflector is a branch of a port of ruby/ActiveRecord’s inflector class to make it easier singularize and pluralize english words. The original author didn’t seem much interested in it any more, so hoping to give it a better life, I added asdf files and a test suite and fleshed out some of the special cases. I also use it in clsql-orm to make singular class names from plural table names if that is your kind of thing.
As always, as soon as I release a library, I can see all the mistakes I was happy leaving in until other people could see it. In Group-By I found all sorts of inconsistencies in my approach, and so to make this tiny library better I rewrote the important bits. The main problem was that this started as an alist grouping mechanism. But alists became untenable at depths greater than 1 or 2, or if linear lookup was unacceptably slow. For more efficiency I had looked at grouping into hash table; for a usable interface I looked at grouping into CLOS tree-nodes. Then I combined all three approaches into a monstrosity. The problem with this approach was that it conflated wanting a nice/usable interface (which CLOS can provide), with the efficiency issues of looking up children via a hash table or list. As such I had this strange mirroring of awful to use datastructure backends, barely wrapped in a nicer CLOS interface.
No more, now the structure of multiple groupings is a CLOS tree of grouped-list objects, while the children are stored in a single hashtable or list on each tree node (with methods defined so you should never have to worry about the implementation other than to adjust performance). This greatly simplified my ability to think about what this library was doing, and cleaned up what I considered to be some fairly glaring ugliness. Overall i think this refactoring was a victory.
It would be nice to switch implementations from list to hashtable when we noticed the number of children increasing past a certain threshold, but I have left that for a later date.
A recurring problem I have experienced while programming, is the need to convert a flat list of data into a more complex tree data structure. This is especially common when dealing with results from relational databases (where all data is intrinsically flat, and queries return tables of data). To solve this problem I wrote a small library named group-by (in honor of the sql operator that performs much the same task).
The easiest example:
(group-by '((a 1 2) (a 3 4) (b 5 6)))
=> ((A (1 2) (3 4)) (B (5 6)))
A more concrete example is from trac, the ticketing system we use. Trac tickets contain fields for author, project, milestone, and summary (among others). When displaying this data, my project manager wants to be able to see what everybody is working on (a tree view organized by author, project, and milestone), as well as being able to see what is being worked on in a project and by whom (a tree view organized by project and milestone). To accomplish this I pull a flat list of ticket objects from the database (using a clsql-orm generated class). I then create a tree from this data table by calling make-grouped-list. I can then perform a standard recursive tree walk to render this with the desired organization directly.
:keys (list #'author #'project #'milestone)
:tests (list #'equal #'equal #'equal))
Group-by supports grouping into alists, hashtables, and CLOS tree-nodes. To hide the difference between these implementations, I created a grouped-list CLOS object that manages all of the grouping and presents a unified interface to each of these implementation strategies. I support each of these implementations because which to use is strongly dependent on the workload you anticipate performing with the tree. Simply grouping once then recursively rendering the tree, is often more efficient as an alist, than a heavier weight data structure. Conversely, hashtables tend to perform better for lots of accesses into the grouping structure.
To see more, runnable examples, please checkout the project page and the examples file.