Nov 29, 2017

Parquet file query DSL with FastParse

A year after my last foray into DSL parsers I find myself working with Scala full time. My first impression is that Scala folks like re-implementing things from scratch despite easy access to the first-class Java library ecosystem. With the standard Scala library having surprisingly high for a mature language turnover (e.g. see original Actor library replaced with AKKA, very inefficient original parser combinators, Scala Collections overhaul, Monix tasks trying to catch up with CompletableFutures) I am not entirely convinced it's always justified.

Nevertheless it could be helpful to be able to see the same DSL grammar implemented using another modern parser library. FastParse seems to be the current favorite. While trying to wrap my head around it I went back to the original grammar and ported it to FastParse. There is not that much to see because the actual parser is a single class of fewer than 100 LoC. I implemented it mostly following the example of Parboiled-based parser but it also has family resemblance with ANTLR grammar.

Among my first impressions:
  • FastParse version is the smallest in LoC (with a factor of 1.5-2)
  • FastParse can easily express basic lexer rules that I had to experiment with harder in ANTLR 
  • it takes effort (and copy-pasting from examples) to deal with FastParse "grammar" code. Overloaded Scala operators are reasonable and even convenient but their non-standard nature requires some getting used to
  • there is no need for additional MVN/SBT plugins
  • Scala itself is the largest transitive dependency, FastParse is a couple of JARs of under 400K
  • FastParse generates reasonable traces for debugging 
  • it's very easy to enable case-insensitivity though it adds noise to the parser
  • it's very easy to automatically skip whitespaces though the syntax for doing it looks odd
  • map-based syntax for building custom AST nodes is pleasant
  • online documentation is quite extensive and includes a couple large-scale parsers
  • you can use FastParse from Java; I did it to re-use my tests written for last year's parsers

In general I definitely like FastParse more than Parboiled. This comparison is somewhat unfair because in comparison with Java any Scala code is more terse. I would go as far as to say that for tiny DSLs FastParse should be the default choice. For a large grammar readability, visual tools, and likely performance of ANTLR are still as important as ever.


No comments: