Creating and programming domain specific languages


In this blog post I will provide links to documents, packages, blog posts, and discussions for creating and utilizing Domain Specific Languages (DSLs). I have discussed a few DSLs in previous blog posts (linked below). This blog post provides a more general, higher level view on the application and creation of DSLs. The concrete examples are with Mathematica, but the steps are general and can be done with any programming languages and tools.

When to apply DSLs

Here are some situations for applying DSLs.

  1. When designing conversational engines.
  2.  When there are too many usage scenarios and tuning options for the developed algorithms.
    • For example, we have a bunch of search, recommendation, and interaction algorithms for a dating site. A different, User Experience Department (UED) designs interactive user interfaces for these algorithms. We make a natural language DSL that invokes the different algorithms according to specified outcomes. With the DSL the different designs produced by UED are much easily prototyped, implemented, or fleshed out. The DSL also gives to UED easier to understand view on the functionalities provided by the algorithms.
  3. When designing an API for a collection of algorithms.
    • Just designing a DSL can bring clarity of what signatures should be in the API.
    • NIntegrate‘s Method option was designed and implemented using a DSL. See this video between 25:00 and 27:30.

Designing DSLs

  1. Decide what kind of sentences the DSL is going to have.
    • Are natural language sentences going to be used?
    • Are the language words known beforehand or not?
  2. Prepare, create, or accumulate a list of representative sentences.
    • In some cases using Morphological Analysis can greatly help for coming up with use cases and the corresponding sentences.
  3. Create a context free grammar that describes the sentences from the previous step. (Or a large subset of them.)
    • At this stage I use exclusively Extended Backus-Naur Form (EBNF).
    • In some cases the grammar terminals are not know at the design stage and have to retrieved in some way. (From a database or though natural language processing.)
    • Some conversational engine systems allow or require to the grammar specification to be done in XML. I would still do BNF and then move to XML
      •  It is not that hard to write a parser-and-interpreter that translates BNF into XML. See the end of this blog post for that kind of translation of BNF into OMPL.
  4. Program parser(s) for the grammar.
    • I use most of the time functional parsers.
    • The package FunctionalParsers.m provides a Mathematica implementation of this kind of parsing.
    • The package can automatically generate parsers from a grammar given in EBNF. (See the coding example below.)
    • I have programmed versions of this package in R and Lua.
  5. Program an interpreter for the parsed sentences.
    • At this stage the parsed sentences are hooked to the algorithms of the problem domain.
    • The package FunctionalParsers.m allows this to be done fairly easy.
  6. Test the parsing and interpretation.

See the code example below illustrating steps 3-6.

Introduction to using DSLs in Mathematica

  1. This blog post “Natural language processing with functional parsers” gives an introduction to the DSL application in Mathematica.
  2. This detailed slide-show presentation “Functional parsers for an integration requests language grammar” shows how to use the package FunctionalParsers.m over a small grammar.
  3. The answer of the MSE question “How to parse a clojure expression?” gives a good introduction with a simple grammar and shows both direct parser programming and automatic generation from EBNF.

Advanced example

The blog post “Simple time series conversational engine” discusses the creation (design and programming) of a simple conversational engine for time series analysis (data loading, finding outliers and trends.)

Here is a movie demonstrating that conversational engine:

Other discussions

  1. A small part, from 17:30 to 21:00, of the WTC 2012 “Spatial Access Methods and Route Finding” presentation shows a DSL for points of interest queries.
  2. The answer of the MSE question “CSS Selectors for Symbolic XML” uses FunctionalParsers.m .
  3. This Quantile Regression presentation is aided by the  “Simple time series conversational engine” mentioned above.

Coding example

This coding example demonstrates steps 3-6 discussed above.