homeresumeabout
New syntax and linear runtime options in clj-peg
09.07.07
This page is out of date by at least several releases. You should refer to the latest User Manual found on the clj-peg project page.

Version 0.4 of the clj-peg library is available, and there have been a few changes. I had time scheduled to respond to the release of 0.2.4, but since there hasn't been anything to respond to I've used the time to move ahead with 0.4. This version includes an improved syntax and an explanation of how to avoid possible exponential runtimes.
Improved Syntax
As for the improved syntax, previous versions had a verbose format that felt more like CL than Clojure. The only delimiters were parens and operators required two or more characters, in one case, four characters. This release is still backwards compatible but allows for using curly-braces, and square-brackets as well as parens.
Older syntax:
Non-Terminal <- (=ast F (=o A (=s B C)))

New syntax:
Non-Terminal <- {F (| A [B C])}

In addition to appearing less Lisp-like and more Clojure-like, the new syntax more closely matches the data structures returned. Where a vector data type is used in defining part of a grammar, a vector data type is returned. Use of the map data type, while not matching the related return type, feels more Clojure-like since there is a key/value type of relation between the branch of the AST as a value and the function meant to parse that branch of the AST as the key.
There still needs to be another in-depth post on the details of the syntax. For now though, (=o A B) becomes (| A B), and (=s A B) becomes [A B]. The other operators =+, =*, =?, =!, and =& simply lose their leading '='. The previous way of matching the 'end of input', or (=e) is now simply $no parens.
Here's the syntax from the Intro:
Expr         <- [BalancedExpr $]
BalancedExpr <- (| BalancedSeq Empty)
BalancedSeq  <- (+ (| ParenExpr BracketExpr))
ParenExpr    <- [LParen BalancedExpr RParen]
LParen       <- "("
RParen       <- ")"
BracketExpr  <- [LBracket BalancedExpr RBracket]
LBracket     <- "["
RBracket     <- "]"
Empty        <- ""

Avoiding Exponential Runtimes
While I was thinking about possible ways to add in memoization as the typical compensation for the PEG behaviour of backtracking, I realized that it was there all along. The previous, and current, syntax allows for insertion of function calls at any point. All the function needs to do is either accept [sub-peg] or [& sub-peg] as a parameter and return a function that takes an i-struct as it's only parameter. The developer is then free to implement whatever backtrack-compensating mechanism they feel is best, at whatever point they feel is best.