As always, as soon as I release a library, I can see all the mistakes I was happy leaving in until other people could see it. In Group-By I found all sorts of inconsistencies in my approach, and so to make this tiny library better I rewrote the important bits. The main problem was that this started as an alist grouping mechanism. But alists became untenable at depths greater than 1 or 2, or if linear lookup was unacceptably slow. For more efficiency I had looked at grouping into hash table; for a usable interface I looked at grouping into CLOS tree-nodes. Then I combined all three approaches into a monstrosity. The problem with this approach was that it conflated wanting a nice/usable interface (which CLOS can provide), with the efficiency issues of looking up children via a hash table or list. As such I had this strange mirroring of awful to use datastructure backends, barely wrapped in a nicer CLOS interface.
No more, now the structure of multiple groupings is a CLOS tree of grouped-list objects, while the children are stored in a single hashtable or list on each tree node (with methods defined so you should never have to worry about the implementation other than to adjust performance). This greatly simplified my ability to think about what this library was doing, and cleaned up what I considered to be some fairly glaring ugliness. Overall i think this refactoring was a victory.
It would be nice to switch implementations from list to hashtable when we noticed the number of children increasing past a certain threshold, but I have left that for a later date.
A recurring problem I have experienced while programming, is the need to convert a flat list of data into a more complex tree data structure. This is especially common when dealing with results from relational databases (where all data is intrinsically flat, and queries return tables of data). To solve this problem I wrote a small library named group-by (in honor of the sql operator that performs much the same task).
The easiest example:
(group-by '((a 1 2) (a 3 4) (b 5 6)))
=> ((A (1 2) (3 4)) (B (5 6)))
A more concrete example is from trac, the ticketing system we use. Trac tickets contain fields for author, project, milestone, and summary (among others). When displaying this data, my project manager wants to be able to see what everybody is working on (a tree view organized by author, project, and milestone), as well as being able to see what is being worked on in a project and by whom (a tree view organized by project and milestone). To accomplish this I pull a flat list of ticket objects from the database (using a clsql-orm generated class). I then create a tree from this data table by calling make-grouped-list. I can then perform a standard recursive tree walk to render this with the desired organization directly.
:keys (list #'author #'project #'milestone)
:tests (list #'equal #'equal #'equal))
Group-by supports grouping into alists, hashtables, and CLOS tree-nodes. To hide the difference between these implementations, I created a grouped-list CLOS object that manages all of the grouping and presents a unified interface to each of these implementation strategies. I support each of these implementations because which to use is strongly dependent on the workload you anticipate performing with the tree. Simply grouping once then recursively rendering the tree, is often more efficient as an alist, than a heavier weight data structure. Conversely, hashtables tend to perform better for lots of accesses into the grouping structure.
To see more, runnable examples, please checkout the project page and the examples file.