Phone dialing conversational agent


This blog post proclaims the first committed project in the repository ConversationalAgents at GitHub. The project has designs and implementations of a phone calling conversational agent that aims at providing the following functionalities:

  • contacts retrieval (querying, filtering, selection),
  • contacts prioritization, and
  • phone call (work flow) handling.
  • The design is based on a Finite State Machine (FSM) and context free grammar(s) for commands that switch between the states of the FSM. The grammar is designed as a context free grammar rules of a Domain Specific Language (DSL) in Extended Backus-Naur Form (EBNF). (For more details on DSLs design and programming see [1].)

    The (current) implementation is with Wolfram Language (WL) / Mathematica using the functional parsers package [2, 3].

    This movie gives an overview from an end user perspective.

    General design

    The design of the Phone Conversational Agent (PhCA) is derived in a straightforward manner from the typical work flow of calling a contact (using, say, a mobile phone.)

    The main goals for the conversational agent are the following:

    1. contacts retrieval — search, filtering, selection — using both natural language commands and manual interaction,
    2. intuitive integration with the usual work flow of phone calling.

    An additional goal is to facilitate contacts retrieval by determining the most appropriate contacts in query responses. For example, while driving to work by pressing the dial button we might prefer the contacts of an up-coming meeting to be placed on top of the prompting contacts list.

    In this project we assume that the voice to text conversion is done with an external (reliable) component.

    It is assumed that an user of PhCA can react to both visual and spoken query results.

    The main algorithm is the following.

    1) Parse and interpret a natural language command.

    2) If the command is a contacts query that returns a single contact then call that contact.

    3) If the command is a contacts query that returns multiple contacts then :

    3.1) use natural language commands to refine and filter the query results,

    3.2) until a single contact is obtained. Call that single contact.

    4) If other type of command is given act accordingly.

    PhCA has commands for system usage help and for canceling the current contact search and starting over.

    The following FSM diagram gives the basic structure of PhCA:


    This movie demonstrates how different natural language commands switch the FSM states.

    Grammar design

    The derived grammar describes sentences that: 1. fit end user expectations, and 2. are used to switch between the FSM states.

    Because of the simplicity of the FSM and the natural language commands only few iterations were done with the Parser-generation-by-grammars work flow.

    The base grammar is given in the file "./Mathematica/PhoneCallingDialogsGrammarRules.m" in EBNF used by [2].

    Here are parsing results of a set of test natural language commands:


    using the WL command:

    ParsingTestTable[ParseJust[pCALLCONTACT\[CirclePlus]pCALLFILTER], ToLowerCase /@ queries]

    (Note that according to PhCA’s FSM diagram the parsing of pCALLCONTACT is separated from pCALLFILTER, hence the need to combine the two parsers in the code line above.)

    PhCA’s FSM implementation provides interpretation and context of the functional programming expressions obtained by the parser.

    In the running script "./Mathematica/PhoneDialingAgentRunScript.m" the grammar parsers are modified to do successful parsing using data elements of the provided fake address book.

    The base grammar can be extended with the "Time specifications grammar" in order to include queries based on temporal commands.


    In order to experiment with the agent just run in Mathematica the command:


    The imported Wolfram Language file, "./Mathematica/PhoneDialingAgentRunScript.m", uses a fake address book based on movie creators metadata. The code structure of "./Mathematica/PhoneDialingAgentRunScript.m" allows easy experimentation and modification of the running steps.

    Here are several screen-shots illustrating a particular usage path (scan left-to-right):

    "PhCA-1-call-someone-from-x-men"" "PhCA-2-a-producer" "PhCA-3-the-third-one

    See this movie demonstrating a PhCA run.


    [1] Anton Antonov, "Creating and programming domain specific languages", (2016), MathematicaForPrediction at WordPress blog.

    [2] Anton Antonov, Functional parsers, Mathematica package, MathematicaForPrediction at GitHub, 2014.

    [3] Anton Antonov, "Natural language processing with functional parsers", (2014), MathematicaForPrediction at WordPress blog.

    Text analysis of Trump tweets


    This post is to proclaim the MathematicaVsR at GitHub project “Text analysis of Trump tweets” in which we compare Mathematica and R over text analyses of Twitter messages made by Donald Trump (and his staff) before the USA president elections in 2016.

    The project follows and extends the exposition and analysis of the R-based blog post "Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half" by David Robinson at; see [1].

    The blog post [1] links to several sources that claim that during the election campaign Donald Trump tweeted from his Android phone and his campaign staff tweeted from an iPhone. The blog post [1] examines this hypothesis in a quantitative way (using various R packages.)

    The hypothesis in question is well summarized with the tweet:

    Every non-hyperbolic tweet is from iPhone (his staff).
    Every hyperbolic tweet is from Android (from him).
    — Todd Vaziri (@tvaziri) August 6, 2016

    This conjecture is fairly well supported by the following mosaic plots, [2]:

    TextAnalysisOfTrumpTweets-iPhone-MosaicPlot-Sentiment-Device TextAnalysisOfTrumpTweets-iPhone-MosaicPlot-Device-Weekday-Sentiment

    We can see the that Twitter messages from iPhone are much more likely to be neutral, and the ones from Android are much more polarized. As Christian Rudder (one of the founders of OkCupid, a dating website) explains in the chapter "Death by a Thousand Mehs" of the book "Dataclysm", [3], having a polarizing image (online persona) is as a very good strategy to engage online audience:

    […] And the effect isn’t small-being highly polarizing will in fact get you about 70 percent more messages. That means variance allows you to effectively jump several "leagues" up in the dating pecking order – […]

    (The mosaic plots above were made for the Mathematica-part of this project. Mosaic plots and weekday tags are not used in [1].)

    Concrete steps

    The Mathematica-part of this project does not follow closely the blog post [1]. After the ingestion of the data provided in [1], the Mathematica-part applies alternative algorithms to support and extend the analysis in [1].

    The sections in the R-part notebook correspond to some — not all — of the sections in the Mathematica-part.

    The following list of steps is for the Mathematica-part.

    1. Data ingestion
      • The blog post [1] shows how to do in R the ingestion of Twitter data of Donald Trump messages.

      • That can be done in Mathematica too using the built-in function ServiceConnect, but that is not necessary since [1] provides a link to the ingested data used [1]:

      • Which leads to the ingesting of an R data frame in the Mathematica-part using RLink.

    2. Adding tags

      • We have to extract device tags for the messages — each message is associated with one of the tags "Android", "iPad", or "iPhone".

      • Using the message time-stamps each message is associated with time tags corresponding to the creation time month, hour, weekday, etc.

      • Here is summary of the data at this stage:


    3. Time series and time related distributions

      • We can make several types of time series plots for general insight and to support the main conjecture.

      • Here is a Mathematica made plot for the same statistic computed in [1] that shows differences in tweet posting behavior:


      • Here are distributions plots of tweets per weekday:


    4. Classification into sentiments and Facebook topics

      • Using the built-in classifiers of Mathematica each tweet message is associated with a sentiment tag and a Facebook topic tag.

      • In [1] the results of this step are derived in several stages.

      • Here is a mosaic plot for conditional probabilities of devices, topics, and sentiments:


    5. Device-word association rules

      • Using Association rule learning device tags are associated with words in the tweets.

      • In the Mathematica-part these associations rules are not needed for the sentiment analysis (because of the built-in classifiers.)

      • The association rule mining is done mostly to support and extend the text analysis in [1] and, of course, for comparison purposes.

      • Here is an example of derived association rules together with their most important measures:


    In [1] the sentiments are derived from computed device-word associations, so in [1] the order of steps is 1-2-3-5-4. In Mathematica we do not need the steps 3 and 5 in order to get the sentiments in the 4th step.


    Using Mathematica for sentiment analysis is much more direct because of the built-in classifiers.

    The R-based blog post [1] uses heavily the "pipeline" operator %>% which is kind of a recent addition to R (and it is both fashionable and convenient to use it.) In Mathematica the related operators are Postfix (//), Prefix (@), Infix (~~), Composition (@*), and RightComposition (/*).

    Making the time series plots with the R package "ggplot2" requires making special data frames. I am inclined to think that the Mathematica plotting of time series is more direct, but for this task the data wrangling codes in Mathematica and R are fairly comparable.

    Generally speaking, the R package "arules" — used in this project for Associations rule learning — is somewhat awkward to use:

    • it is data frame centric, does not work directly with lists of lists, and

    • requires the use of factors.

    The Apriori implementation in “arules” is much faster than the one in “AprioriAlgorithm.m” — “arules” uses a more efficient algorithm implemented in C.


    [1] David Robinson, "Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half", (2016),

    [2] Anton Antonov, "Mosaic plots for data visualization", (2014), MathematicaForPrediction at WordPress.

    [3] Christian Rudder, Dataclysm, Crown, 2014. ASIN: B00J1IQUX8 .

    Creating and programming domain specific languages


    In this blog post I will provide links to documents, packages, blog posts, and discussions for creating and utilizing Domain Specific Languages (DSLs). I have discussed a few DSLs in previous blog posts (linked below). This blog post provides a more general, higher level view on the application and creation of DSLs. The concrete examples are with Mathematica, but the steps are general and can be done with any programming languages and tools.

    When to apply DSLs

    Here are some situations for applying DSLs.

    1. When designing conversational engines.
    2.  When there are too many usage scenarios and tuning options for the developed algorithms.
      • For example, we have a bunch of search, recommendation, and interaction algorithms for a dating site. A different, User Experience Department (UED) designs interactive user interfaces for these algorithms. We make a natural language DSL that invokes the different algorithms according to specified outcomes. With the DSL the different designs produced by UED are much easily prototyped, implemented, or fleshed out. The DSL also gives to UED easier to understand view on the functionalities provided by the algorithms.
    3. When designing an API for a collection of algorithms.
      • Just designing a DSL can bring clarity of what signatures should be in the API.
      • NIntegrate‘s Method option was designed and implemented using a DSL. See this video between 25:00 and 27:30.

    Designing DSLs

    1. Decide what kind of sentences the DSL is going to have.
      • Are natural language sentences going to be used?
      • Are the language words known beforehand or not?
    2. Prepare, create, or accumulate a list of representative sentences.
      • In some cases using Morphological Analysis can greatly help for coming up with use cases and the corresponding sentences.
    3. Create a context free grammar that describes the sentences from the previous step. (Or a large subset of them.)
      • At this stage I use exclusively Extended Backus-Naur Form (EBNF).
      • In some cases the grammar terminals are not know at the design stage and have to retrieved in some way. (From a database or though natural language processing.)
      • Some conversational engine systems allow or require to the grammar specification to be done in XML. I would still do BNF and then move to XML
        •  It is not that hard to write a parser-and-interpreter that translates BNF into XML. See the end of this blog post for that kind of translation of BNF into OMPL.
    4. Program parser(s) for the grammar.
      • I use most of the time functional parsers.
      • The package FunctionalParsers.m provides a Mathematica implementation of this kind of parsing.
      • The package can automatically generate parsers from a grammar given in EBNF. (See the coding example below.)
      • I have programmed versions of this package in R and Lua.
    5. Program an interpreter for the parsed sentences.
      • At this stage the parsed sentences are hooked to the algorithms of the problem domain.
      • The package FunctionalParsers.m allows this to be done fairly easy.
    6. Test the parsing and interpretation.

    See the code example below illustrating steps 3-6.

    Introduction to using DSLs in Mathematica

    1. This blog post “Natural language processing with functional parsers” gives an introduction to the DSL application in Mathematica.
    2. This detailed slide-show presentation “Functional parsers for an integration requests language grammar” shows how to use the package FunctionalParsers.m over a small grammar.
    3. The answer of the MSE question “How to parse a clojure expression?” gives a good introduction with a simple grammar and shows both direct parser programming and automatic generation from EBNF.

    Advanced example

    The blog post “Simple time series conversational engine” discusses the creation (design and programming) of a simple conversational engine for time series analysis (data loading, finding outliers and trends.)

    Here is a movie demonstrating that conversational engine:

    Other discussions

    1. A small part, from 17:30 to 21:00, of the WTC 2012 “Spatial Access Methods and Route Finding” presentation shows a DSL for points of interest queries.
    2. The answer of the MSE question “CSS Selectors for Symbolic XML” uses FunctionalParsers.m .
    3. This Quantile Regression presentation is aided by the  “Simple time series conversational engine” mentioned above.

    Coding example

    This coding example demonstrates steps 3-6 discussed above.



    Natural language processing with functional parsers

    Natural language Processing (NLP) can be done with a structural approach using grammar rules. (The other type of NLP is using statistical methods.) In this post I discuss the use of functional parsers for the parsing and interpretation of small sets of natural language sentences within specific contexts. Functional parsing is also known as monadic parsing and parsing combinators.

    Generally, I am using functional parsers to make Domain-Specific Languages (DSL’s). I use DSL’s to make command interfaces to search and recommendation engines and also to design and prototype conversational engines. I use extensively the so called Backus-Naur Form (BNF) for the grammar specifications. Clearly a DSL can be very close to natural language and provide sufficient means for interpretation within a given (narrow) context. (Like function integration in Calculus, smart phone directory browsing and selection, or search for something to eat nearby.)

    I implemented and uploaded a package for construction of functional parsers: see FunctionalParsers.m hosted by the project MathematicaForPrediction at GitHub.

    The package provides ability to quickly program parsers using a core system of functional parsers as described in the article “Functional parsers” by Jeroen Fokker .

    The parsers (in both the package and the article) are categorized in the groups: basic, combinators, and transformers. Immediate interpretation can be done with transformer parsers, but the package also provides functions for evaluation of parser output within a context of data and functions.

    Probably most importantly, the package provides functions for automatic generation of parsers from grammars in EBNF.

    Here is an example of parsing the sentences of an integration requests language:

    Interpretation of integration requests

    Here is the grammar:
    Integration requests EBNF grammar

    The grammar can be visualized with this mind map:

    Integration command

    The mind map was hand made with MindNode Pro. Generally, the branches represent alternatives, but if two branches are connected the direction of the arrow connecting them shows a sequence combination.

    With the FunctionalParsers.m package we can automatically generate a
    mind map parsing the string of the grammar in EBNF to OMPL:


    (And here is a PDF of the automatically generated mind map: IntegrationRequestsGenerated . )

    I also made a slide show that gives an introduction to how the package is used: “Functional parsers for an integration requests language grammar”.

    A more complicated example is this conversational engine for manipulation of time series data. (Data loading, finding outliers and trends. More details in the next blog post.)

    Statistical thesaurus from NPR podcasts

    Five months ago I worked with transcripts of National Public Radio (NPR) podcasts. The transcripts are available at — see for example “From child actor to artist…“.

    Using nearly 5000 transcripts I experimented with topic extraction and statistical thesaurus derivation. The topics are too bulky to show here, but I am going to show some of the statistical thesaurus entries.

    I used dimension reduction with Non-Negative Matrix Factorization (NNMF). For more detailed explanations, code for computations, and experimental results see this paper “Topic and thesaurus extraction from a document collection” provided by the MathematicaForPrediction project at GitHub. (The code for NNMF is also provided by the MathematicaForPrediction project at GitHub.)

    First let me describe the data. The collection has 5123 transcripts.

    Here is a sample of the transcripts (only the first 400 characters of each are taken):
    NPR podcast sample 400 characters per podcast

    Here is the distribution of the string lengths of the transcripts:
    5123 NPR podcasts string length

    I removed custom selected stop words from the transcripts. I also stemmed the words using the stemmer called snowball, see The stemmed words are called “terms” below.

    Here are descriptive statistics and the distribution of the number of transcripts per term:
    Transcripts per term

    Here are descriptive statistics and the distribution of the number of terms per transcript:
    Terms per transcript

    I did not compute the whole statistical thesaurus. Instead I made a function that computes the thesaurus entry of a given word using the right NNMF factor with proper normalization.

    Here are sample results of the thesaurus entry “retrieval” (note that the right column contains word stems):
    Statistical thesaurus entries