Even More Fun With CL-PPCRE Filter Functions

A while ago I posted about my adventures playing with CL-PPCRE filter functions. In the previous blog post I destructively modify a cl-ppcre parse tree to add a filter function that can handle matching matched pairs of parentheses (a typical example of what regular expressions are NOT capable of). In this post I formalize that example into something that could be more broadly applied with less understanding of the underlying mechanics.

To begin with I define a function create-scanner-with-filters that will handle creating these special scanners for me. My idea is to provide a table of functions that should be called when we see certain strings inside of the regular expression. Because there are already named groups (see *allow-named-registers*) that can have parameters and that CL-PPCRE is already parsing for me, I decided to tie into the named registers to handle my function dispatching. This has the added niceness that whatever your filter matches is going to be stored in a register.

An over view of this process is: parse the regex, replace any named-register nodes’ (that have a function in the table) third element (usually a regex whose match will be stored in a register) with our specialized filter function, compile the new scanner and return that to the end user. I also decided that the regex that is the body of the named group should be available to the filter and in most cases should probably be used as part of the filter function.

If I continue to play with this, I might eventually release it as a library, but for now its stands well on its own.

Without further ado:

(cl-interpol:enable-interpol-syntax)
(declaim (optimize (debug 3)))

;; TODO: group binds in body expressions
;; TODO: propogate current scanner options to body scanners

(defun make-matched-pair-matcher (open-char close-char)
  "Will create a regex filter that can match arbitrary pairs of matched characters
   such as (start (other () some) end)"
  (lambda (body-regex)
    (setf body-regex (if (eql body-regex :void)
                         nil
                         (cl-ppcre:create-scanner
                          `(:SEQUENCE :START-ANCHOR ,body-regex :END-ANCHOR))))
    (lambda (pos)
      ;;(format T "TEST3 ~A ~A ~%" cl-ppcre::*reg-starts* cl-ppcre::*reg-ends*)
      (iter
        (with fail = nil)
        (with start = pos)
        (with cnt = 0)
        (for c = (char cl-ppcre::*string* pos))
        (if (first-iteration-p)
            (unless (eql c open-char) (return fail))
            ;; went past the string without matching
            (when (>= pos (length cl-ppcre::*string*))
              (return fail)))
        (cond
          ((eql c open-char) (incf cnt))
          ((eql c close-char)
           (decf cnt)
           (when (zerop cnt) ;; found our last matching char
             (if (or (null body-regex)
                     (cl-ppcre:scan body-regex cl-ppcre::*string*
                                    :start (+ 1 start)
                                    :end pos))
                 (return (+ 1 pos))
                 (return fail)))))
        (incf pos)))))

(defun default-dispatch-table ()
  "Creates a default dispatch table with a parens dispatcher that can match
   pairs of parentheses"
  `(("parens" . ,(make-matched-pair-matcher #\( #\) ))))

(defun create-scanner-with-filters
    (regex &optional (function-table (default-dispatch-table)) )
  "Allows named registers to refer to functions that should be in
   the place of the named register"
  (let* ((cl-ppcre:*allow-named-registers* T)
         (p-tree (cl-ppcre:parse-string regex)))
    (labels ((dispatcher? (name)
               "Return the name of the dispatcher from the table if
                applicable"
               (cdr (assoc name function-table :test #'string-equal)))
             (mutate-tree (tree)
               "Changes the scanner parse tree to include any filter
                functions specified in the table"
               (typecase tree
                 (null nil)
                 (atom tree)
                 (list
                  (aif (and (eql :named-register (first tree))
                            (dispatcher? (second tree)))
                       `(:named-register (second tree)
                         (:filter ,(funcall it (third tree))))
                       (iter (for item in tree)
                         (collect (mutate-tree item))))))))
      ;; mutate the regex to contain our matcher functions
      ;; then compile it
      (cl-ppcre:create-scanner (mutate-tree p-tree)))))

(defparameter *example-function-phrase*
  "some times I like to \"function (calling all coppers (), another param (), test)\" just to see what happens")

(defun run-examples ()
  "Just runs some examples expected results:

   ((\"function (calling all coppers (), another param (), test)\"
     #(\"(calling all coppers (), another param (), test)\"))
    (\"function (calling all coppers (), another param (), test)\"
     #(\"(calling all coppers (), another param (), test)\"))
    (NIL))

  "
  (flet ((doit (regex)
           (multiple-value-list
            (cl-ppcre:scan-to-strings
             (create-scanner-with-filters regex)
             *example-function-phrase*))))
  (list
   (doit #?r"function\s*(?<parens>)")
   (doit #?r"function\s*(?<parens>([^,]+,)*[^,]+)")
   (doit #?r"function\s*(?<parens>not-matching-at-all)"))))

PS. I don’t claim this is actually worth anything, only that I had fun doing it.

Data-Table and CL-CSV

Data-Table

I have quite a few database driven web applications that make heavy use of tabular imports and exports (from their primary database, other databases, and exterior data sources (eg: CSVs). This data structure provides column, row, and cell access to getting and setting values, as well as providing functionality to create composite data-tables by retrieving and combining subsections of existing data-tables. This library also aims to ease type coercion from strings to common-lisp types.

CL-CSV

I had many scattered, not well tested, not easily runnable pieces of CSV code. I was unhappy with this situation, then decided to refactor all of this into a single project. I wrote tests for it and had a library so I thought I might release it. This project started as extensions and bugfixes on arnesi’s CSV.

I then looked around and saw there are other CSV libraries out there that probably mostly accomplished what I had set out to do. However, I already had my code that was tested, had an easier license (BSD), and provided a framework to interact with my other libraries and systems, so I figured why not just release it anyway.

The only interesting code in this library (to me) is that I managed to make the read/write-csv functions accept a string, pathname, or stream as the first argument and I managed to make sure that streams get closed if these functions created them (file streams for example), but not if the stream was passed in. Nothing great, but I had fun writing it.

Other niceties I would like to continue to build out in this library is its integration with other related libs (like CLSQL). I have code to handle exporting database queries as CSVs as well as code to handle importing CSVs into databases both serially and in bulk. I also use data-tables to have a lisp representation of the just parsed data-table and to coerce that table of string values into relevant common-lisp types.

CLSQL & Webapps

A commonly experienced error when using CLSQL in a web environment is database connections conflicting with each other from simultaneous web requests. These problems arise because, by default, clsql standard-db-objects keep a reference to the connection they were queried / created from and reuse this database connection (rather than a new one you may have provided with clsql-sys:with-database). This means that two separate threads could try to use the same database connection (provided through clsql-sys:with-database or by having objects queried from the same connection accessed in multiple threads / http requests).

We solved this problem by introducing a clsql-sys::choose-database-for-instance method (available in clsql master branch from http://git.b9.com/clsql.git. (This branch will eventually be released as CLSQL6) Then in our web applications we define the following class and method override. Usually I then pass this name to clsql-orm or as a direct superclass to any of my web def-view-classes. After this, I just use with-database to establish dynamic connection bindings and everything pretty much works out (as these dynamic bindings are not shared across threads).

(defclass clsql::web-db-obj (clsql-sys:standard-db-object)
    nil
    (:metaclass clsql-sys::standard-db-class))

(defmethod clsql-sys::choose-database-for-instance
    ((object clsql::web-db-obj) &optional database)
  (or database clsql-sys:*default-database*))

(clsql-sys:def-view-class table-1 (clsql::web-db-obj)
    (...))

(clsql-orm:gen-view-classes
 :package :net.company.my.db
 :nicknames :my-db
 :export-symbols t
 :classes '(users employees salaries)
 :inherits-from '(clsql::web-db-obj))

Note: CLSQL-Fluid seems to be trying to accomplish much the same goals.

Collectors – A common lisp collection library

Collectors is a common lisp library to help accumulate values, that I just pushed to my github account.

Sometimes you just want to collect a list of things. Actually I need to do this all the time. Usually I end up iterating over something in which case Iterate‘s collecting/appending/unioning clauses serve me well. Sometimes though, that just is not a good fit, or I need to accumulate in places iterate deems unacceptable. In these cases it is nice to have specific collector macros. These setup an environment where a function is available that when called with an argument collects it, or when called without arguments returns the results of the collection.

These macros started as a piece of arnesi, but have been modified and added to. It is also nice to be able to include a library for a specific functionality set rather than a bag of semi-related, useful things.

CL-Inflector

CL-Inflector is a branch of a port of ruby/ActiveRecord’s inflector class to make it easier singularize and pluralize english words. The original author didn’t seem much interested in it any more, so hoping to give it a better life, I added asdf files and a test suite and fleshed out some of the special cases. I also use it in clsql-orm to make singular class names from plural table names if that is your kind of thing.

Group-By Refactor

As always, as soon as I release a library, I can see all the mistakes I was happy leaving in until other people could see it. In Group-By I found all sorts of inconsistencies in my approach, and so to make this tiny library better I rewrote the important bits. The main problem was that this started as an alist grouping mechanism. But alists became untenable at depths greater than 1 or 2, or if linear lookup was unacceptably slow. For more efficiency I had looked at grouping into hash table; for a usable interface I looked at grouping into CLOS tree-nodes. Then I combined all three approaches into a monstrosity. The problem with this approach was that it conflated wanting a nice/usable interface (which CLOS can provide), with the efficiency issues of looking up children via a hash table or list. As such I had this strange mirroring of awful to use datastructure backends, barely wrapped in a nicer CLOS interface.

No more, now the structure of multiple groupings is a CLOS tree of grouped-list objects, while the children are stored in a single hashtable or list on each tree node (with methods defined so you should never have to worry about the implementation other than to adjust performance). This greatly simplified my ability to think about what this library was doing, and cleaned up what I considered to be some fairly glaring ugliness. Overall i think this refactoring was a victory.

It would be nice to switch implementations from list to hashtable when we noticed the number of children increasing past a certain threshold, but I have left that for a later date.

Group-By: A Common Lisp library to help group data into trees

A recurring problem I have experienced while programming, is the need to convert a flat list of data into a more complex tree data structure. This is especially common when dealing with results from relational databases (where all data is intrinsically flat, and queries return tables of data). To solve this problem I wrote a small library named group-by (in honor of the sql operator that performs much the same task).

The easiest example:

(group-by '((a 1 2) (a 3 4) (b 5 6)))
=> ((A (1 2) (3 4)) (B (5 6)))

A more concrete example is from trac, the ticketing system we use. Trac tickets contain fields for author, project, milestone, and summary (among others). When displaying this data, my project manager wants to be able to see what everybody is working on (a tree view organized by author, project, and milestone), as well as being able to see what is being worked on in a project and by whom (a tree view organized by project and milestone). To accomplish this I pull a flat list of ticket objects from the database (using a clsql-orm generated class). I then create a tree from this data table by calling make-grouped-list. I can then perform a standard recursive tree walk to render this with the desired organization directly.

Example Call:
(make-grouped-list tickets
:keys (list #'author #'project #'milestone)
:tests (list #'equal #'equal #'equal))

Example Rendering:

Group-by supports grouping into alists, hashtables, and CLOS tree-nodes. To hide the difference between these implementations, I created a grouped-list CLOS object that manages all of the grouping and presents a unified interface to each of these implementation strategies. I support each of these implementations because which to use is strongly dependent on the workload you anticipate performing with the tree. Simply grouping once then recursively rendering the tree, is often more efficient as an alist, than a heavier weight data structure. Conversely, hashtables tend to perform better for lots of accesses into the grouping structure.

To see more, runnable examples, please checkout the project page and the examples file.

CL-Creditcard & CL-Authorize-net: Processing credit card payments with common lisp

I just pushed cl-creditcard to my github account. CL-Creditcard( and sub-library cl-authorize-net) is a library that we use to process payments with Authorize.Net. We have a large internal application that tracks and manages all our business and customer logic including billing and invoicing. (Invoices are generated using cl-typesetting). This application charges credit card payments through this cl-authorize-net.

It has been stable and charging cards for years and I just got around to releasing it. Soon it will also support ACH (echeck) transactions and we will be moving to lisp-unit from lift.

As with all payment processing, test very well before putting into production :)

CLSQL-ORM: Turn you existing database schema into Common Lisp CLSQL-View-Classes

CLSQL-ORM is a common lisp library I just pushed to my git hub account. Its primary goal is to generate CLSQL-view-classes based on an existing database. It uses the “information_schema” views to introspect relevant data from the database, then builds and evals the view-class definitions. This project is a significant branch of clsql-pg-introspect, attempting to remove its “pg” aspects in favor of the standard “information_schema”. It might have changed some semantics/conventions along the way, (I’m not sure as I didn’t use the original project much, and that was long ago).

I wanted to generate my lisp objects from the database for a couple reasons. One, I am fairly comfortable with SQL databases and am used to specifying them in whatever variant of sql the database engine supports. Two, I am most often presented with an extant working database that I want to interact with (such as a wordpress install), where the schema of the database can change, and I just want my common lisp to match whatever the database says, rather than trying to keep both up to date with each other manually. Obviously this project encodes many of my own, personal thoughts and tastes about databases, which may not be the same as your thoughts and tastes. This project is perhaps best though of as a jumping off point for creating your own personal common lisp ORM, though it should be usable as is, if your tastes and mine coincide.