# Simple time series conversational engine

## Introduction

In this blog post I am going to discuss the creation (design and programming) of a simple conversational engine for time series analysis. The conversational engine responds to commands for:
1. loading time series data (weather data, stock data, or data files),
2. finding outliers,
3. analysis by curve fitting,
4. plotting, and
5. help and state changes.

This blog post is related to the blog post  and uses the package with functional parsers  and the Quantile regression package .

I programmed this engine ten months ago and here is link to a movie that demonstrates it, : https://www.youtube.com/watch?v=wlZ5ANglVI4 .

A package with all of the code of the conversational engine can be downloaded from GitHub,  and executed with `Import`:

``` Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Examples/SimpleTimeSeriesConversationalEngine.m"] ```

More about functional parsers, which were essential for the building of the conversational engine, can be read in “Functional parsers” by Fokker, .

## Design of the conversational engine

The conversational engine is a finite state machine. It has a command recognizer state and command application state. The switching between the states is done upon completion states functions. The recognized commands are English sentences. Some of the sentences include Mathematica code.

Let use describe the structure of the conversational engine.

### States

1. Acceptor state
The Acceptor state parses and recognizes commands. If the given command is successfully parsed then the parsing result is given to the Transducer state. Otherwise a message “Unknown command” is emitted and no state change is made (i.e.we are still in the Acceptor state).

2. Transducer state
Interprets the parsed a command by the Acceptor state. This state emits different messages according to the loaded data and accumulated graphics objects.

The states are named according to the classification given in the Wikipedia article about finite-state machines, . According to that classification the conversational engine is a Mealy machine.

### Responses

The results of the applied time series analysis are graphical. The Transducer state accumulates the graphs of the commands and displays them with the data. The conversational engine responses have a spoken part and a graphics part. The responses have at least a spoken part. The graphics part can omitted.

### Commands

The weather data commands can be for temperature, wind speed, and pressure over a specified city. For example, “load temperature of Atlanta” or “load the Chicago wind speeds”.

The financial data commands can be for stock price or trading volume. For example, “load trade volume of NYSE:GOOG” or “load stock price of NYSE:APPL”.

The weather and stocks data loading commands do not take time period, they take only city and company specifications for WeatherData and FinancialData respectively.

The time series analysis commands we consider are: curve fitting by a list specified functions, finding regression quantiles, and finding outliers. For example, “compute least squares fit of {1,x,Sin[x/10^6]}”, “compute 5 regression quantiles”, “find top outliers”.

The graphics commands are “plot data”, “plot(s) joined”, “plot(s) not joined”, “clear graphics”.

In addition, the conversational engine takes global, service commands like “help”, “all commands”, “start over’, etc.

## Grammar specification

In order to respond to the commands that are English sentences we have to be able to parse these sentences into concrete state or message specifications. This means that we have to define the grammatical structures of the command sentences and the words in them.

This section describes the grammar of the commands using Extended Backus-Naur Form (EBNF). The package FunctionalParsers.m, , has Mathematica functions that generate parsers from EBNF of a grammar (see  and the next section). The resulting parsers are used in the conversational engine described above.

Let us explain the derivation and rationale of the grammar rules for the data loading commands. (I am not going to describe all the language considerations and decisions I have made for the commands grammar. These decisions can be traced in the EBNF grammar specification.)

We want to parse and interpret commands like:
1. load data file ‘timeTable.csv’ ,
2. load temperature of Atlanta ,
3. load the wind speeds of Chicago ,
4. load the stock price of NYSE:APPL ,

We can see that we have three different types of data loading commands: for files, for weather data, and for financial data. Each of these commands when parsed would be easy to interpret using Mathematica’s functions Import, WeatherData, and FinancialData respectively. Note that — because we want to keep the grammar simple — for the weather and financial data there is no time period specification. For the weather data we are going to load data for the last 60 days; for the financial data for the last 365 days. Also, for simplicity, we are not going to consider data loading commands that start with the phrases “get”, “get me”, “please load”, etc.

To this end the general load data command structure is:

Note that in the grammar rules above some of the words are optional; the optional words are put within square brackets. With the first rule we have separated the file loading from the weather and financial data loading. The file loading is further specified with rule 2. Rule 3 separates the preamble of the weather and financial data loading commands from the data specification. The latter is partitioned in rule 4. Rule 5 gives all of the possible data type requests for the weather data: temperature, pressure, and wind speed. Rule 6 gives the two possible types for financial data requests: stock price and trade volume. The file name, city, and company specifications are left out, but the rules for them are be easy to formulate.

### Full grammar specification

In this sub-section the full grammar specification is given. Note that some parts of the commands are specified to be dropped from the parsing results using the symbols “\[RightTriangle]” and “\[LeftTriangle]”. (For example ” ‘trade’ \[RightTriangle] ‘volume’ ” means “parse the sequence of words ‘trade’ ‘volume’ and return ‘volume’ as a result”.) The specification ‘_LetterString’ is an implementation shortcut for matching words and identifiers with the package FunctionalParsers.m, . The rules were simplified for clarity, the actual conversational engine code can be retrieved, .

The grammar specified above takes complicated commands like “find regression quantiles over temperature of Atlanta”.

(In certain cases I prefer reading grammars in EBNF using Emacs and ebnf-mode.el, .)

## Parsers for the grammar

Using the grammar specification  and the package FunctionalParsers.m, , we can generate the parsers for that grammar. A detailed description of the parser generation process is given in .

Here is the generation step:

``` tokens = ToTokens[timeSeriesEBNFCode]; res = GenerateParsersFromEBNF[tokens]; res // LeafCount```

``` ```

```Out= 3069 ```

The generation step creates a set of parsers:

`In:= Magnify[Shallow[res[[1, 1, 2]], 6], 0.8]` Here are the parsing results returned by the “master” rule `pTSCOMMAND`: Here is another list for the engine’s “service” commands: And see their parsing outcome using the function ParsingTestTable provided by FunctionalParsers.m:

```questions = {"load data file '~/example.csv'", "load file '~/anotherExample.csv'"}; ParsingTestTable[ParseShortest[pLOADFILECOMMAND], questions]``` Note that the parser takes commands with Mathematica expressions parts if those expressions are written without spaces:

```questions = {"least squares fit with x+Sin[x]"}; ParsingTestTable[ParseShortest[pTSCOMMAND], questions]``` This type of commands are specified with the grammar rule:

= ( ‘least’ , ‘squares’ , [ ‘fit’ ] , [ ‘with’ | ‘of’ ] ) \[RightTriangle] ‘_String’ ;

## Interpreters (for the parsers)

Next we have to write interpreters for the parsing results. Note that the parsing results are Mathematica expressions with heads that hint to the semantic meaning behind the parsed commands. Since we are viewing the conversational engine as a finite state machine we can say that for each parsing result we have to write a function that “reacts” to result’s head and content. This is a fairly straightforward process very similar to the one described by the Interpreter design pattern, .

A package with all of the code of the conversational engine can be downloaded from GitHub, . The interpreters code can be found in the section “Interpreters”.

## The conversational engine programming

The conversational engine was programmed using Mathematica’s dynamic evaluation functions and mechanisms. The implementation follows the Absorber and Transductor state design described above in the section “Design of the conversational engine”.

The engine has three panels: for command input, for spoken output, and for graphical output. It is fully demonstrated with the movie “Time series conversational engine” posted on YouTube, .

As it was mentioned in the introduction the engine can be run using `Import`: ## Future plans

The engine can be extended in many directions depending on the final goals for it.

One must have extension is the ability to take time period specifications for the weather and financial data.

The time series analysis can be extended with more detailed analysis using functions like TimeSeriesModelFit or heteroscedasticity analysis using multiple regression quantiles, .

A third set of extensions is the ability to load additional kinds of data, like, the evolutions of countries populations or gross domestic products.

## References

 Anton Antonov, Natural language processing with functional parsers, blog post at WordPress, (2014).

 Anton Antonov, Functional parsers Mathematica package, source code at GitHub, https://github.com/antononcube/MathematicaForPrediction, package FunctionalParsers.m, (2014).

 Anton Antonov, Quantile regression Mathematica package, source code at GitHub, https://github.com/antononcube/MathematicaForPrediction, package QuantileRegression.m, (2013).

 Anton Antonov, Time series conversational engine, a YouTube movie.

 Wikipedia entry: Finite-state Machine, http://en.wikipedia.org/wiki/Finite-state_machine .

 Jeroen Fokker, Functional parsers, In: Advanced Functional Programming, Tutorial text of the First international spring school on advanced functional programming techniques in Båstad, Sweden, may 1995. (Johan Jeuring and Erik Meijer, eds). Berlin, Springer 1995 (LNCS 925), pp. 1-23. http://www.staff.science.uu.nl/~fokke101/article/parsers/index.html .

 Jeramey Crawford, Extended Backus-Naur Form Mode, ebnf-mode.el, emacs LISP at Github, https://github.com/jeramey/ebnf-mode .

 Anton Antonov, Time series conversational engine grammar in EBNF, TimeSeriesConversationalEngineGrammar.ebnf at the MathematicaForPrediction project at GitHub.

 Anton Antonov, Functional parsers for an integration requests language, PDF document at GitHub, https://github.com/antononcube/MathematicaForPrediction/tree/master/Documentation , (2014).

 Erich GammaTaligent, Richard Helm, Ralph Johnson, John Vlissides, Design patterns: elements of reusable object-oriented softwareAddison-Wesley Longman Publishing Co., Inc. Boston, MA, USA ©1995, ISBN:0-201-63361-2, http://en.wikipedia.org/wiki/Design_Patterns .

 Anton Antonov, Estimation of conditional density distributions, “Mathematica for prediction algorithms” blog at WordPress, (2014).

 Anton Antonov, Simple Time Series Conversational Engine Mathematica package, SimpleTimeSeriesConversationalEngine.m at the MathematicaForPrediction project at GitHub.

# Directional quantile envelopes in 3D

## Introduction

This blog post was mostly made as a response to a comment of my previous blog post “Directional quantile envelopes”, :

This looks extremely useful and elegant – any plans on generalizing to 3 (or more) dimensions?
–Jan Liphardt

Since I did say in the previous blog post that the algorithm can be easily extended to 3D and higher dimensions, I decided to illustrate that with an example. The 3D implementation is really easy using Mathematica’s geometric computation functions (i) for derivation of rotation matrices and (ii) for regions manipulation. (The latter are new in version 10.)

The algorithm shown below is a slightly modified version of the algorithm in “Directional quantile envelopes”. Instead of the step that does the intersection of directional quantile lines with Reduce, we can use ImplicitRegion to specify the region enclosed by the directional quantile planes. We can use other built-in functions for region predicates and manipulation to derive, visualize, or utilize the directional quantile regions found by the algorithm. (All these are exemplified below.)

## Data generation

Let us first generate some data.

```npoints = 10000; data = {RandomReal[{0, 2 Pi}, npoints], RandomReal[{-Pi, Pi}, npoints], RandomVariate[PoissonDistribution[4.8], npoints]}; data = MapThread[#3*{Cos[#1] Cos[#2], Sin[#1] Cos[#2], Sin[#2]} &, data];```

Let us plot the data

```Block[{qs = 12}, qs = Map[Quantile[#, Range[0, 1, 1/(qs - 1)]] &, Transpose[data]]; ListPointPlot3D[data, PlotRange -> All, PlotTheme -> "Detailed", FaceGrids -> {{{0, 0, -1}, Most[qs]}, {{0, 1, 0}, qs[[{1, 3}]]}, {{-1, 0, 0}, Rest[qs]}}, ImageSize -> 700] ]``` On the plot the grid lines are derived from the quantiles of x, y and z coordinates of the data set.

## Algorithm application step-by-step

1. Standardize the data
1.1. This step is generally not needed and in this implementation would complicate things.

2. Create a set of uniform angles. (Here we generate vectors that define 3D directions.)

```nqs = 20; dirs = N@Flatten[ Table[{Cos[\[Theta]] Cos[\[Phi]], Sin[\[Theta]] Cos[\[Phi]], Sin[\[Phi]]}, {\[Theta], 2 \[Pi]/(10 nqs), 2 \[Pi], 2 \[Pi]/nqs}, {\[Phi], -\[Pi], \[Pi], 2 \[Pi]/nqs}], 1];```

`Graphics3D[Map[Arrow[{{0, 0, 0}, #}] &, dirs]]` 3. Rotate the data for each direction vector, a ∈ A. (Here we just generate the rotation matrices.)

```In:= rmats = RotationMatrix[{{1, 0, 0}, #}] & /@ dirs; Length[rmats]```

``` ```

`Out= 420`

4. Find the q-quantile of the z-coordinates of the data rotated at direction a. Denote that point with Z[a,q].

```In:= qs = {0.95}; qDirPoints = Flatten[Map[Function[{m}, Quantile[(m.Transpose[data])[], qs]], rmats]];```

5. Find the points z[a,q] corresponding to Z[a,q] in the original coordinate system.
5.1. In this implementation this step is redundant, see the next step.

6. Find the quantile planes intersections
6.1. Instead of doing this step we simply use ImplicitRegion.

```In:= qRegion = ImplicitRegion[ MapThread[(#1.{x, y, z})[] <= #2 &, {rmats, qDirPoints}], {x, y, z}];```

`In:= Shallow[qRegion]`

``` ```

```Out//Shallow= ImplicitRegion[ LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && LessEqual[<>] && <>, {x, y, z}]```

## Wrapping it in a function

Here is a definition of a function that wraps all of the algorithms steps in the previous section.

``` Clear[QuantileEnvelopeRegion] QuantileEnvelopeRegion[points_?MatrixQ, quantile_?NumberQ, numberOfDirections_Integer] := Block[{nd = numberOfDirections, dirs, rmats, qDirPoints, qRegion}, dirs = N@Flatten[ Table[{Cos[\[Theta]] Cos[\[Phi]], Sin[\[Theta]] Cos[\[Phi]], Sin[\[Phi]]}, {\[Theta], 2 \[Pi]/(10 nd), 2 \[Pi], 2 \[Pi]/nd}, {\[Phi], -\[Pi], \[Pi], 2 \[Pi]/nd}], 1]; rmats = RotationMatrix[{{1, 0, 0}, #}] & /@ dirs; qDirPoints = Flatten[Map[ Function[{m}, Quantile[(m.Transpose[points])[], quantile]], rmats]]; qRegion = ImplicitRegion[ MapThread[(#1.{x, y, z})[] <= #2 &, {rmats, qDirPoints}], {x, y, z}]; qRegion ] /; Dimensions[points][] == 3 && 0 < quantile <= 1; ```

## Visualizing

From the implicit region specification we can generate an approximation of the boundary surface using the built-in function BoundaryDiscretizeRegion.

```qRegion = QuantileEnvelopeRegion[data, 0.95, 20]; qRegion2 = QuantileEnvelopeRegion[data, 0.8, 20];```

```qSurface = BoundaryDiscretizeRegion[qRegion]; qSurface2 = BoundaryDiscretizeRegion[qRegion2]; ```

`Grid[{{qSurface, qSurface2}}]` Now we can visualize the quantile surface together with the data points:

Block[{c = 3, opts, pntsgr},
``` opts = {PlotRange -> {{-c, c}, {0, c}, {-c, c}}, Boxed -> True, Axes -> True, ImageSize -> {Automatic, 500}}; pntsgr = Graphics3D[Point[data]]; Grid[{{Show[{qSurface, pntsgr}, opts], Show[{qSurface2, pntsgr}, opts]}}] ]``` ## Check

Now we can answer the question how many data points are inside the quantile directions regions. Again, this is easy with the built-in region functionalities of Mathematica version 10.

```In:= inPoints = Select[data, # \[Element] qRegion &]; Length[inPoints]/Length[data] // N```

``` ```

`Out= 0.5974`

```In:= inPoints = Select[data, # \[Element] qRegion2 &]; Length[inPoints]/Length[data] // N```

``` ```

`Out= 0.1705`

(In these code lines I kept “[\Element]” instead of replacing it with “∈” in order the lines to be copy and pasted. )
As we can see the algorithm would produce a region Subscript[R, q] which contains a much smaller number of points than the requested quantile, q. This property of the algorithm is discussed in . (In the two dimensional space this property is less pronounced.)

Obviously, though, we would expect the bias of the algorithm to be monotonic with respect to the quantiles requested. Given a set of points P and quantiles u and v, 0<u<v<=1, for the respectively produced by the algorithm regions Ru and Rv we have |P ∩ Ru| < |P ∩ Rv| .

## References

 Anton Antonov, Directional quantile envelopes, “Mathematica for prediction algorithms” blog at WordPress.com, 11/3/2014 .

 Linglong Kong, Ivan Mizera, “Quantile tomography: using quantiles with multivariate data”, 2013, arXiv:0805.0056v2 [stat.ME] URL: http://arxiv.org/abs/0805.0056 .

# Directional quantile envelopes

## Introduction

In this blog post I am going to discuss the computation of the so called directional quantile envelopes; see  for definitions, theorems, and concrete examples. The Mathematica package QuantileRegression.m,  has an experimental implementation, the function `QuantileEnvelope`.

This type of implementation investigation is a natural extension of the quantile regression implementations and utilization I have done before. (See the package  and the related guide ; blog posts [3,4].) I was looking for a 2D quantile regression technique that can be used as a tool for finding 2D outliers. From the exposition below it should be obvious that the technique can be generalized in higher dimensions.

The idea of directional quantile envelopes is conceptually simple (and straightforward to come up with). The calculation is also relatively simple: over a set of uniformly distributed directions we find the lines that separate the data according a quantile parameter q, 0<q<1, and with those lines we approximate the enveloping curve for data that corresponds to q.

## The algorithm

Here is the full algorithm. (The figure below can be used as a visual aid).

Parameters: data as a matrix of two columns, a quantile parameter q, 0<q<1.
1. Standardize the data.
1.1. Choose a center of the data (e.g. means of the coordinates) and a scale (e.g. inter-quartile distances).
1.2. Transpose the data to the center and scale the obtained coordinates.
2. Create a set of k uniform angles, A:={i2Pi/k}, 0<=i<=k-1 .
2.1. Denote with a[i]:=i2Pi/k, i∈[0,k-1].
3. Rotate the data for each angle, a[i]∈A.
4. Find the q-quantile of the y-coordinates of the data rotated at angle a. Denote that point with r[a,q].
5. Find the points y[a,q] corresponding to r[a,q] in the original coordinate system.
6. Find the quantile lines intersections
6.1. For each angle a∈A we denote with n[a] the vector corresponding to a.
6.2. For each a∈A we have a line l[a,q] with normal n[a] and passing through y[a,q].
6.3. Find the intersection points P[q] of the lines l[a,q].
6.3.1. Each point p[j]∈P[q], j∈[1,k] is an intersection of lines at consecutive angles.
6.3.2. In other words p[j]=l[a[j-1],q] ∩ l[a[j],q], j∈[1,k].
7. Apply the reverse of the transformation done at step 3 to P[q].
8. Connecting the consecutive points of P[q] gives the directional quantile envelope approximation.

Click this thumbnail for an image with better math notation: ## Algorithm visual aid

Here is an example of an application of the described algorithm. The points in blue are the data points, the points in red are the points y[a,q], the red lines are the lines l[a,q], the points in black are the points P. ## Remarks

Remark 1: The function QuantileEnvelope of  follows the described algorithm. The rotations are done using matrix multiplication; the function Reduce is used to find the intersections between the lines.

Remark 2: A possible alternative algorithm is to use “the usual” quantile regression with polar coordinates. (The article  refers to and briefly discusses such an approach.)

# Example data

In order to demonstrate the function `QuantileEnvelope` let us define a couple of data sets.

The first data set is generated with the function `SkewNormalDistribution`, so the data set name is `sndData`.

```npoints = 12000; data = {RandomReal[{0, 2 [Pi]}, npoints], RandomVariate[SkewNormalDistribution[0.6, 0.3, 4], npoints]}; data = MapThread[#2*{Cos[#1], Sin[#1]} &, data]; rmat = RotationMatrix[-[Pi]/2.5].DiagonalMatrix[{2, 1}]; data = Transpose[rmat.Transpose[data]]; data = TranslationTransform[{-Norm[Abs[#]/3], 0}][#] & /@ data; sndData = Standardize[data];```

Here is a plot of sndData with grid lines derived from the quantiles of x and y coordinates of the data set.

```ListPlot[sndData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed", GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[sndData]], ImageSize -> 700]``` The second uses the function `PoissonDistribution`, so the data set name is `pdData`.

```npoints = 12000; data = RandomReal[NormalDistribution[12, 5.6], npoints]; data = Transpose[{data, data + RandomVariate[PoissonDistribution, npoints] + RandomReal[{-0.2, 0.2}, npoints]}]; data = Select[data, #[] > 0 &]; data[[All, 2]] = Log[data[[All, 2]]]; pdData = data;```

Here is a plot of `pdData` with grid lines derived from the quantiles of x and y coordinates of the data set.

```ListPlot[pdData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed", GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[pdData]], ImageSize -> 700]``` ## Examples of application (with QuantileEnvelope)

Let us demonstrate the function `QuantileEnvelope`. The function arguments are a data matrix, a quantile specification, and number of directions. The quantile specification can a number or a list of numbers. The function returns the intersection points of the quantile lines at different directions.

Here is an application to the dataset `sndData` over 8 different quantiles over 60 different uniformly spread directions.

```In:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, .99} ; AbsoluteTiming[qsPoints = QuantileEnvelope[sndData, qs, 60];] Out= {0.150104, Null}```

The result is a list of 8 lists of 60 points.

```In:= Dimensions[qsPoints] Out= {8, 60, 2}```

Let us plot the data and the direction quantile envelopes.

```Block[{data = sndData}, Show[{ListPlot[data, AspectRatio -> 1, PlotTheme -> {"Detailed"}, GridLines -> Map[Quantile[#, Range[0, 1, 1/(20 - 1)]] &, Transpose[data]], PlotLegends -> SwatchLegend[ Blend[{Red, Orange}, Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]] & /@ qs, qs]], Graphics[{PointSize[0.005], Thickness[0.0025], MapThread[{Blend[{Red, Orange}, Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]], Tooltip[Line[Append[#2, #2[]]], #1], Point[#2]} &, {qs, qsPoints}]}] }, ImageSize -> 700]]``` Let us similar calculations for the dataset pdData.

```In:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, 0.99} ; AbsoluteTiming[qsPoints = QuantileEnvelope[pdData, qs, 60];] Out= {0.154227, Null} ```

```In:= Dimensions[qsPoints] Out= {8, 60, 2}```

Here is the plot of the dataset `pdData` and directional quantile envelopes. ## Some more quantile curves

Here are couple of more plots of the quantile lines at different directions with a larger number of directions.

Lines for quantile 0.95 in 40 directions: Lines for quantiles 0.95 and 0.8 in 40 directions: ## References

 Anton Antonov, Quantile regression Mathematica package, source code at GitHub, https://github.com/antononcube/MathematicaForPrediction, package QuantileRegression.m, (2013).

 Anton Antonov, Quantile regression through linear programming, usage guide at GitHub, https://github.com/antononcube/MathematicaForPrediction, in the directory “Documentation”, (2013).

 Anton Antonov, Quantile regression through linear programming, “Mathematica for prediction algorithms” blog at WordPress.com, 12/16/2013.

 Anton Antonov, Quantile regression with B-splines, “Mathematica for prediction algorithms” blog at WordPress.com, 1/1/2014.

 Linglong Kong, Ivan Mizera, “Quantile tomography: using quantiles with multivariate data”, 2013, arXiv:0805.0056v2 [stat.ME] URL: http://arxiv.org/abs/0805.0056 .