Importance of variables investigation


This blog post demonstrates a procedure for variable importance investigation of mixed categorical and numerical data.

The procedure was used in a previous blog “Classification and association rules for census income data”, [1]. It is implemented in the package VariableImportanceByClassifiers.m, [2], and described below; I took it from [5].

The document “Importance of variables investigation guide”, [3], has much more extensive descriptions, explanations, and code for importance of variables investigation using classifiers, Mosaic plots, Decision trees, Association rules, and Dimension reduction.

Procedure outline

Here we describe the procedure used (that is also done in [3]).

1. Split the data into training and testing datasets.

2. Build a classifier with the training set.

3. Verify using the test set that good classification results are obtained. Find the baseline accuracy.

4. If the number of variables (attributes) is k for each i, 1ik:

4.1. Shuffle the values of the i-th column of the test data and find the classification success rates.

5. Compare the obtained k classification success rates between each other and with the success rates obtained by the un-shuffled test data.

The variables for which the classification success rates are the worst are the most decisive.

Note that instead of using the overall baseline accuracy we can make the comparison over the accuracies for selected, more important class labels. (See the examples below.)

The procedure is classifier agnostic. With certain classifiers, Naive Bayesian classifiers and Decision trees, the importance of variables can be directly concluded from their structure obtained after training.

The procedure can be enhanced by using dimension reduction before building the classifiers. (See [3] for an outline.)

Implementation description

The implementation of the procedure is straightforward in Mathematica — see the package VariableImportanceByClassifiers.m, [2].

The package can be imported with the command:


At this point the package has only one function, AccuracyByVariableShuffling, that takes as arguments a ClassifierFunction object, a dataset, optional variable names, and the option “FScoreLabels” that allows the use of accuracies over a custom list of class labels instead of overall baseline accuracy.

Here is the function signature:

AccuracyByVariableShuffling[ clFunc_ClassifierFunction, testData_, variableNames_:Automatic, opts:OptionsPattern[] ]

The returned result is an Association structure that contains the baseline accuracy and the accuracies corresponding to the shuffled versions of the dataset. I.e. steps 3 and 4 of the procedure are performed by AccuracyByVariableShuffling. Returning the result in the form Association[___] means we can treat the result as a list with named elements similar to the list structures in Lua and R.

For the examples in the next section we also going to use the package MosaicPlot.m, [4], that can be imported with the following command:


Concrete application over the “Titanic” dataset

1. Load some data.

testSetName = "Titanic";
trainingSet = ExampleData[{"MachineLearning", testSetName}, "TrainingData"];
testSet = ExampleData[{"MachineLearning", testSetName}, "TestData"];

2. Variable names and unique class labels.

varNames =
Flatten[List @@
ExampleData[{"MachineLearning", testSetName}, "VariableDescriptions"]]

(*Out[1055]= {"passenger class", "passenger age", "passenger sex", "passenger survival"} *)

classLabels =
Union[ExampleData[{"MachineLearning", testSetName}, "Data"][[All, -1]]]

(*Out[1056]= {"died", "survived"} *)

3. Here is a data summary.

Grid[List@RecordsSummary[(Flatten /@ (List @@@
Join[trainingSet, testSet])) /. _Missing -> 0, varNames],
Dividers -> All, Alignment -> {Left, Top}]


4. Make the classifier.

clFunc = Classify[trainingSet, Method -> "RandomForest"]

(*Out[1010]= ClassifierFunction[\[Ellipsis]] *)

5. Obtain accuracies after shuffling.

accs = AccuracyByVariableShuffling[clFunc, testSet, varNames]

(*Out[1011]= <|None -> 0.78117, "passenger class" -> 0.704835, "passenger age" -> 0.768448,
"passenger sex" -> 0.610687|>*)

6. Tabulate the results.

List @@@ Normal[accs/First[accs]],
Style[#, Bold, Blue,
FontFamily -> "Times"] & /@ {"shuffled variable", "accuracy ratio"}],
Alignment -> Left, Dividers -> All]


7. Further confirmation of the found variable importance can be done using the mosaic plots.
We can see that female passengers are much more likely to survive and especially female passengers from first and second class.

t = (Flatten /@ (List @@@ trainingSet));
MosaicPlot[t[[All, {1, 3, 4}]],
ColorRules -> {3 -> ColorData[7, "ColorList"]} ]


5a. In order to use F-scores instead of overall accuracy the desired class labels are specified with the option “FScoreLabels”.

accs = AccuracyByVariableShuffling[clFunc, testSet, varNames,
"FScoreLabels" -> classLabels]

(*Out[1101]= <| None -> {0.838346, 0.661417}, "passenger class" -> {0.79927, 0.537815},
"passenger age" -> {0.828996, 0.629032},
"passenger sex" -> {0.703499, 0.337449}|>*)

5b. Here is another example that uses the class label with the smallest F-score.
(Probably the most important since it is most mis-classified).

accs = AccuracyByVariableShuffling[clFunc, testSet, varNames,
"FScoreLabels" -> Position[#, Min[#]][[1, 1, 1]] &@
ClassifierMeasurements[clFunc, testSet, "FScore"]]

(*Out[1102]= {0.661417}, "passenger class" -> {0.520325},
"passenger age" -> {0.623482}, "passenger sex" -> {0.367521}|>*)

5c. It is good idea to verify that we get the same results using different classifiers. Below is given code that computes the shuffled accuracies and returns the relative damage scores for a set of methods of Classify.

mres = Association@Map[
cf = Classify[titanicTrainingData, Method -> clMethod];
accRes =
AccuracyByVariableShuffling[cf, testSet, varNames,
"FScoreLabels" -> "survived"];
clMethod -> (accRes[None] - Rest[accRes])/accRes[None]
], {"LogisticRegression", "NearestNeighbors", "NeuralNetwork",
"RandomForest", "SupportVectorMachine"}] ;



[1] Anton Antonov, “Classification and association rules for census income data”, (2014), MathematicaForPrediction at

[2] Anton Antonov, Variable importance determination by classifiers implementation in Mathematica, (2015), source code at MathematicaForPrediction at GitHub, package VariableImportanceByClassifiers.m.

[3] Anton Antonov, “Importance of variables investigation guide”, (2016), MathematicaForPrediction at GitHub, folder Documentation.

[4] Anton Antonov, Mosaic plot for data visualization implementation in Mathematica, (2014), MathematicaForPrediction at GitHub, package MosaicPlot.m.

[5] Leo Breiman et al., Classification and regression trees, Chapman & Hall, 1984, ISBN-13: 978-0412048418.

RSparseMatrix for sparse matrices with named rows and columns


In the last few years I have used a lot R’s base library Matrix that has implementation of sparse matrix objects and efficient computations. To the sparse matrices from R’s Matrix library one can assign and retrieve row names and column names with the functions colnames and rownames. Sometimes I miss this in Mathematica so I started a Mathematica package that implements similar functionalities. The package is named RSparseMatrix.m has purely Mathematica language implementations (i.e. it does not use RLink ). It can be loaded/downloaded from MathematicaForPrediction at GitHub:


The package provides functions to create and do operations over  RSparseMatrix objects of that are basically SparseArray objects with row and column names. A major design decision is to restrict these functionalities to two dimensional sparse arrays and lists of strings as row and column names. (Note that the package is not finished and in some functions the row and column names are ignored.)

The package attempts to cover as many as possible of the functionalities for sparse matrix objects that are provided by R’s Matrix library. (Sub-matrix extraction by row and column names, row and column names propagation for dot products, row and column binding sparse matrices, row and column sums, etc.) This document has examples and tests for RSparseMatrix.m .

My participation in WTC 2015 with a talk about Mathematica and R comparison was one the main motivators to write this blog post. Another is this Mathematica StackExchange discussion. (And a third one is seeing tonight the impressive movie “The Martian” — such a display of the triumph of the humans over space and nature using technology and science in a creative way made me wanna discuss how to make some programming objects more convenient.)

Basic examples


rmat = MakeRSparseMatrix[
 {{1, 1} -> 1, {2, 2} -> 2, {4, 3} -> 3, {1, 4} -> 4, {3, 5} -> 2},
 "ColumnNames" -> {"a", "b", "c", "d", "e"},
 "RowNames" -> {"A", "B", "C", "D"},
 "DimensionNames" -> {"U", "V"}]

The function MatrixForm shows the RSparseMatrix objects with their row and column names:

rmat // MatrixForm

The RSparseMatrix objects can be created from SparseArray objects:

Query functions

These functions can be used to retrieve the names of rows, columns, and dimensions. They correspond to R’s functions rownames, colnames, dimnames.

In[154]:= RowNames[rmat]
Out[154]= {"A", "B", "C", "D"}
In[155]:= ColumnNames[rmat]
Out[155]= {"a", "b", "c", "d", "e"}
In[156]:= DimensionNames[rmat]
Out[156]= {"U", "V"}

Functions that work on SparseArray

Of course since RSparseMatrix is based on SparseArray we would expect the functions that work on SparseArray objects to work RSpaseMatrix objects too. E.g. Dimensions, ArrayRules, Transpose, Total, and others.

In[157]:= Dimensions[rmat]
Out[157]= {4, 5}
In[158]:= ArrayRules[rmat]
Out[158]= {{1, 1} -> 1, {1, 4} -> 4, {2, 2} -> 2, {3, 5} -> 2, {4, 3} -> 3, {_, _} -> 0}

Dot product

Row names and column names are respected for dot products if that leads to meaningful assignments. The examples below demonstrate a general principle:

When a matrix operation can be performed on the underlying sparse arrays but the row names or column names do not coincide the names are dropped.

In the tables with examples below the last rows show the heads of the results.

Matrix by vector


Matrix by matrix




A major useful feature is to have Part work with row and column names. The implementation of that additional functionality for Part is demonstrated below.

In the cases when the dimension drops sparse arrays or numbers are returned. In R the operation “[” has the parameter “drop” — the expression “smat[1,,drop=F]” is going to be a sparse matrix, the expression “smat[1,,drop=T]” is going to be a dense vector. The corresponding implementation is to have the option “Drop->True|False” for Part, but that does not seem a good idea. And we can easily emulate the “drop” option in R using “{_?AtomQ}” inside Part.


Neat example

Consider this incidence matrix that represents a bi-partite graph of relationships of actors starring in movies:


We can use a RSparseMatrix object of it with named rows and columns (rBiMat).

Here is the corresponding graph:


If we want to see which actors have participated in movies together with Orlando Bloom we can do the following:


Finding local extrema in noisy data using Quantile Regression


This blog post (article) describes an algorithm for finding local extrema in noisy data using Quantile Regression. The problem formulation and a solution for it using polynomial model fitting (through LinearModelFit) were taken from Mathematica StackExchange — see “Finding Local Minima / Maxima in Noisy Data”, [1].

The proposed Quantile Regression algorithm is a version of the polynomial fitting solution (proposed by Leonid Shifrin in [1]) and has the following advantages: (i) requires less parameter tweaking, and (ii) more importantly it is more robust with very noisy and oscillating data. That robustness is achieved by using two regression fitted curves: one close to the local minima and another close to the local maxima, computed for low and high quantiles respectively. (Quantile Regression is uniquely able to do that.)

The code for this blog post is available at [6].

A complete version of this blog post is available as a PDF document here: Finding local extrema in noisy data using Quantile Regression.pdf.

The problem

Here is the problem formulation from [1].

Problem 1: We have a list of pairs of numbers representing measurements of a scalar variable on a regular time grid. The measurements have noise in them.

As a data example for this problem the author of the question in [1] has provided the following code:

temptimelist = Range[200]/10;
tempvaluelist = Sinc[#] &@temptimelist + RandomReal[{-1, 1}, 200]*0.02;
data1 = Transpose[{temptimelist, tempvaluelist}];
ListPlot[data1, PlotRange -> All, Frame -> True, ImageSize -> 500]


In this article we are going to consider a more general problem hinted in the discussion of the solutions in [1].

Problem 2: Assume that the data in Problem 1 is collected several times and the noise is present in both the measured values and the time of measurement. Also the data can be highly oscillatory in nature.

Consider this data generation as an example for Problem 2:

n = 1000;
xs = N@Rescale[Range[n], {1, n}, {0, 60}];
data2 = Flatten[
 Table[Transpose[{xs + 0.1 RandomReal[{-1, 1}, Length[xs]], 
 Map[5 Sinc[#] + Sin[#] + 4 Sin[1/4 #] &, xs] + 
 1.2 RandomVariate[SkewNormalDistribution[0, 1, 0.9], 
 Length[xs]]}], {10}], 1];
ListPlot[data2, PlotRange -> All, Frame -> True, ImageSize -> 500]


Note that this data has 10000 points and it is much larger than the data for Problem 1.

Extrema location approximation by model fitting followed by nearest neighbors search

Several solutions are given in [1]. A couple are using wavelets or Gaussian filtering for de-noising. The one we are going to focus on in this article is using polynomial fitting for extrema location approximation and then finding the actual data extrema by Nearest Neighbors (NN’s) search. It is provided by Leonid Shifrin.

Let us list the steps of that algorithm:

1. Fit a polynomial through the data (using LinearModelFit).
2. Find the local extrema of the fitted polynomial. (We will call them fit estimated extrema.)
3. Around each of the fit estimated extrema find the most extreme point in
the data by nearest neighbors search (by using Nearest).

We are going to refer to this algorithm as LMFFindExtrema. Its implementation is available here: QuantileRegressionForLocalExtrema.m, [6].

Here is the application of the algorithm to the data example of Problem 1:


(The continuous line shows the fitted curve.)

Here is the application of the algorithm to the data example of Problem 2:


It does not help to just increase the number of basis polynomials and the number of NN’s examined points:


We can also see that increasing the number of examined NN’s for each of the fit estimated extrema would make some the final result points to be “borrowed” from neighboring peaks in the data.

Experiments with fitting trigonometric basis functions showed very good fit to the data but the calculations of the fit estimated extrema were very slow.

Using Quantile Regression

It is trivial to rewrite the algorithm LFMFindExtrema to use Quantile Regression instead of linear model fitting of polynomials. We are going to use the Mathematica package [2] implementation described in [3,4]. The function QuantileRegression provided by [2] uses a B-spline basis [5] to find the quantile regression curves (also known as regression quantiles).

We are going to call this algorithm QRFindExtrema. Again its implementation is available here: QuantileRegressionForLocalExtrema.m, [6].

The algorithm QRFindExtrema has the following parameters: the data, number of B-spline knots, interpolation order, quantiles (corresponding to the curves to be fitted).

QRFindExtrema returns a list of regression quantile functions and a list of lists with extrema estimates.

More importantly though, QRFindExtrema can use two curves for finding the local extrema: one for local minima, and one for local maxima. (This feature is justified below.)

Here is the application of QRFindExtrema to the example data of Problem 1:


Here is application of QRFindExtrema to the data of Problem 2:


We can see that the regression quantile for 0.5 is too flat to get good judgment of the local extrema. We can get better results if we increase the number of knots for the B-spline basis built by QuantileRegression.


We can see that results look “almost right”, the horizontal locations of the peaks are apparently more-or-less correctly identified, but the result extrema are too close to the fitted curve. Just increasing the number of examined NN’s for the fit estimated extrema does not produce good results because points from neighboring peaks are being chosen as final extrema estimates.


In order to solve this problem we use two regression quantiles. For local minima we use a regression quantile for a low quantile number, say, 0.02; for the local maxima we use a regression quantile for a large quantile number, say, 0.98 .


The use of two (or more) curves to be fitted is an unique capability of Quantile Regression. With this algorithm feature by construction the lower regression quantile is close to the local minima and the higher regression quantile is close to the local maxima.

Also, since we find two regression quantile curves we can use two nearest neighbors finding functions: one with the points below the low regression quantile, and one with the points above the high regression quantile. The implementation in [6] takes an option specification for should the nearest neighbor functions for finding the extrema be constructed using all data points or just the outliers (the points outside of the found regression quantiles).

More examples

More timid noisy and oscillating data with around 10,000 points.



[1] Mathematica StackExchange discussion. “Finding Local Minima / Maxima in Noisy Data”,
URL: .

[2] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,, package QuantileRegression.m, (2013).

[3] Anton Antonov, Quantile regression through linear programming, usage guide at GitHub,, in the directory “Documentation”, (2013).

[4] Anton Antonov, Quantile regression through linear programming, “Mathematica for prediction algorithms” blog at, 12/16/2013.

[5] Anton Antonov, Quantile regression with B-splines, “Mathematica for prediction algorithms” blog at, 1/1/2014.

[6] Anton Antonov, QuantileRegressionForLocalExtrema Mathematica package, source code at GitHub,, package QuantileRegressionForLocalExtrema.m, (2015).
The package is in the directory “Applications”.

Simple time series conversational engine


In this blog post I am going to discuss the creation (design and programming) of a simple conversational engine for time series analysis. The conversational engine responds to commands for:
1. loading time series data (weather data, stock data, or data files),
2. finding outliers,
3. analysis by curve fitting,
4. plotting, and
5. help and state changes.

This blog post is related to the blog post [1] and uses the package with functional parsers [2] and the Quantile regression package [3].

I programmed this engine ten months ago and here is link to a movie that demonstrates it, [4]: .

A package with all of the code of the conversational engine can be downloaded from GitHub, [12] and executed with Import:


More about functional parsers, which were essential for the building of the conversational engine, can be read in “Functional parsers” by Fokker, [5].

Design of the conversational engine

The conversational engine is a finite state machine. It has a command recognizer state and command application state. The switching between the states is done upon completion states functions. The recognized commands are English sentences. Some of the sentences include Mathematica code.

Let use describe the structure of the conversational engine.


1. Acceptor state
The Acceptor state parses and recognizes commands. If the given command is successfully parsed then the parsing result is given to the Transducer state. Otherwise a message “Unknown command” is emitted and no state change is made (i.e.we are still in the Acceptor state).

2. Transducer state
Interprets the parsed a command by the Acceptor state. This state emits different messages according to the loaded data and accumulated graphics objects.

The states are named according to the classification given in the Wikipedia article about finite-state machines, [5]. According to that classification the conversational engine is a Mealy machine.


The results of the applied time series analysis are graphical. The Transducer state accumulates the graphs of the commands and displays them with the data. The conversational engine responses have a spoken part and a graphics part. The responses have at least a spoken part. The graphics part can omitted.


The weather data commands can be for temperature, wind speed, and pressure over a specified city. For example, “load temperature of Atlanta” or “load the Chicago wind speeds”.

The financial data commands can be for stock price or trading volume. For example, “load trade volume of NYSE:GOOG” or “load stock price of NYSE:APPL”.

The weather and stocks data loading commands do not take time period, they take only city and company specifications for WeatherData and FinancialData respectively.

The loading of data files is done with commands like “load file ‘~/LogDataWithSkewedNoise.csv’ “.

The time series analysis commands we consider are: curve fitting by a list specified functions, finding regression quantiles, and finding outliers. For example, “compute least squares fit of {1,x,Sin[x/10^6]}”, “compute 5 regression quantiles”, “find top outliers”.

The graphics commands are “plot data”, “plot(s) joined”, “plot(s) not joined”, “clear graphics”.

In addition, the conversational engine takes global, service commands like “help”, “all commands”, “start over’, etc.

Grammar specification

In order to respond to the commands that are English sentences we have to be able to parse these sentences into concrete state or message specifications. This means that we have to define the grammatical structures of the command sentences and the words in them.

This section describes the grammar of the commands using Extended Backus-Naur Form (EBNF). The package FunctionalParsers.m, [2], has Mathematica functions that generate parsers from EBNF of a grammar (see [9] and the next section). The resulting parsers are used in the conversational engine described above.

Commands for loading data explanations

Let us explain the derivation and rationale of the grammar rules for the data loading commands. (I am not going to describe all the language considerations and decisions I have made for the commands grammar. These decisions can be traced in the EBNF grammar specification.)

We want to parse and interpret commands like:
1. load data file ‘timeTable.csv’ ,
2. load temperature of Atlanta ,
3. load the wind speeds of Chicago ,
4. load the stock price of NYSE:APPL ,
5. load trading volume of NYSE:GOOG .

We can see that we have three different types of data loading commands: for files, for weather data, and for financial data. Each of these commands when parsed would be easy to interpret using Mathematica’s functions Import, WeatherData, and FinancialData respectively. Note that — because we want to keep the grammar simple — for the weather and financial data there is no time period specification. For the weather data we are going to load data for the last 60 days; for the financial data for the last 365 days. Also, for simplicity, we are not going to consider data loading commands that start with the phrases “get”, “get me”, “please load”, etc.

To this end the general load data command structure is:


Note that in the grammar rules above some of the words are optional; the optional words are put within square brackets. With the first rule we have separated the file loading from the weather and financial data loading. The file loading is further specified with rule 2. Rule 3 separates the preamble of the weather and financial data loading commands from the data specification. The latter is partitioned in rule 4. Rule 5 gives all of the possible data type requests for the weather data: temperature, pressure, and wind speed. Rule 6 gives the two possible types for financial data requests: stock price and trade volume. The file name, city, and company specifications are left out, but the rules for them are be easy to formulate.

The command grammar used in the conversational engine uses additional data loading definitions and allows “reverse” data load commands like: “load the Chicago wind speeds”.

Full grammar specification

In this sub-section the full grammar specification is given. Note that some parts of the commands are specified to be dropped from the parsing results using the symbols “\[RightTriangle]” and “\[LeftTriangle]”. (For example ” ‘trade’ \[RightTriangle] ‘volume’ ” means “parse the sequence of words ‘trade’ ‘volume’ and return ‘volume’ as a result”.) The specification ‘_LetterString’ is an implementation shortcut for matching words and identifiers with the package FunctionalParsers.m, [2].


The rules were simplified for clarity, the actual conversational engine code can be retrieved, [8].

The grammar specified above takes complicated commands like “find regression quantiles over temperature of Atlanta”.

(In certain cases I prefer reading grammars in EBNF using Emacs and ebnf-mode.el, [7].)

Parsers for the grammar

Using the grammar specification [8] and the package FunctionalParsers.m, [2], we can generate the parsers for that grammar. A detailed description of the parser generation process is given in [9].

Here is the generation step:

tokens = ToTokens[timeSeriesEBNFCode];
res = GenerateParsersFromEBNF[tokens];
res // LeafCount

Out[4]= 3069

The generation step creates a set of parsers:

In[16]:= Magnify[Shallow[res[[1, 1, 2]], 6], 0.8]


Here are the parsing results returned by the “master” rule pTSCOMMAND:

Here is another list for the engine’s “service” commands:

And see their parsing outcome using the function ParsingTestTable provided by FunctionalParsers.m:

questions = {"load data file '~/example.csv'",
"load file '~/anotherExample.csv'"};
ParsingTestTable[ParseShortest[pLOADFILECOMMAND], questions]


Note that the parser takes commands with Mathematica expressions parts if those expressions are written without spaces:

questions = {"least squares fit with x+Sin[x]"};
ParsingTestTable[ParseShortest[pTSCOMMAND], questions]


This type of commands are specified with the grammar rule:

= ( ‘least’ , ‘squares’ , [ ‘fit’ ] , [ ‘with’ | ‘of’ ] ) \[RightTriangle] ‘_String’ ;

Interpreters (for the parsers)

Next we have to write interpreters for the parsing results. Note that the parsing results are Mathematica expressions with heads that hint to the semantic meaning behind the parsed commands. Since we are viewing the conversational engine as a finite state machine we can say that for each parsing result we have to write a function that “reacts” to result’s head and content. This is a fairly straightforward process very similar to the one described by the Interpreter design pattern, [10].

A package with all of the code of the conversational engine can be downloaded from GitHub, [12]. The interpreters code can be found in the section “Interpreters”.

The conversational engine programming

The conversational engine was programmed using Mathematica’s dynamic evaluation functions and mechanisms. The implementation follows the Absorber and Transductor state design described above in the section “Design of the conversational engine”.

The engine has three panels: for command input, for spoken output, and for graphical output. It is fully demonstrated with the movie “Time series conversational engine” posted on YouTube, [4].

As it was mentioned in the introduction the engine can be run using Import:



Future plans

The engine can be extended in many directions depending on the final goals for it.

One must have extension is the ability to take time period specifications for the weather and financial data.

The time series analysis can be extended with more detailed analysis using functions like TimeSeriesModelFit or heteroscedasticity analysis using multiple regression quantiles, [11].

A third set of extensions is the ability to load additional kinds of data, like, the evolutions of countries populations or gross domestic products.


[1] Anton Antonov, Natural language processing with functional parsers, blog post at WordPress, (2014).

[2] Anton Antonov, Functional parsers Mathematica package, source code at GitHub,, package FunctionalParsers.m, (2014).

[3] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,, package QuantileRegression.m, (2013).

[4] Anton Antonov, Time series conversational engine, a YouTube movie.

[5] Wikipedia entry: Finite-state Machine, .

[6] Jeroen Fokker, Functional parsers, In: Advanced Functional Programming, Tutorial text of the First international spring school on advanced functional programming techniques in Båstad, Sweden, may 1995. (Johan Jeuring and Erik Meijer, eds). Berlin, Springer 1995 (LNCS 925), pp. 1-23. .

[7] Jeramey Crawford, Extended Backus-Naur Form Mode, ebnf-mode.el, emacs LISP at Github, .

[8] Anton Antonov, Time series conversational engine grammar in EBNF, TimeSeriesConversationalEngineGrammar.ebnf at the MathematicaForPrediction project at GitHub.

[9] Anton Antonov, Functional parsers for an integration requests language, PDF document at GitHub, , (2014).

[10] Erich GammaTaligent, Richard Helm, Ralph Johnson, John Vlissides, Design patterns: elements of reusable object-oriented softwareAddison-Wesley Longman Publishing Co., Inc. Boston, MA, USA ©1995, ISBN:0-201-63361-2, .

[11] Anton Antonov, Estimation of conditional density distributions, “Mathematica for prediction algorithms” blog at WordPress, (2014).

[12] Anton Antonov, Simple Time Series Conversational Engine Mathematica package, SimpleTimeSeriesConversationalEngine.m at the MathematicaForPrediction project at GitHub.

Directional quantile envelopes in 3D


This blog post was mostly made as a response to a comment of my previous blog post “Directional quantile envelopes”, [1]:

This looks extremely useful and elegant – any plans on generalizing to 3 (or more) dimensions?
–Jan Liphardt

Since I did say in the previous blog post that the algorithm can be easily extended to 3D and higher dimensions, I decided to illustrate that with an example. The 3D implementation is really easy using Mathematica’s geometric computation functions (i) for derivation of rotation matrices and (ii) for regions manipulation. (The latter are new in version 10.)

The algorithm shown below is a slightly modified version of the algorithm in “Directional quantile envelopes”. Instead of the step that does the intersection of directional quantile lines with Reduce, we can use ImplicitRegion to specify the region enclosed by the directional quantile planes. We can use other built-in functions for region predicates and manipulation to derive, visualize, or utilize the directional quantile regions found by the algorithm. (All these are exemplified below.)

Data generation

Let us first generate some data.

npoints = 10000;
data = {RandomReal[{0, 2 Pi}, npoints],
RandomReal[{-Pi, Pi}, npoints],
RandomVariate[PoissonDistribution[4.8], npoints]};
data = MapThread[#3*{Cos[#1] Cos[#2], Sin[#1] Cos[#2], Sin[#2]} &, data];

Let us plot the data

Block[{qs = 12},
qs = Map[Quantile[#, Range[0, 1, 1/(qs - 1)]] &, Transpose[data]];
ListPointPlot3D[data, PlotRange -> All, PlotTheme -> "Detailed",
FaceGrids -> {{{0, 0, -1}, Most[qs]}, {{0, 1, 0},
qs[[{1, 3}]]}, {{-1, 0, 0}, Rest[qs]}}, ImageSize -> 700]


On the plot the grid lines are derived from the quantiles of x, y and z coordinates of the data set.

Algorithm application step-by-step

1. Standardize the data
1.1. This step is generally not needed and in this implementation would complicate things.

2. Create a set of uniform angles. (Here we generate vectors that define 3D directions.)

nqs = 20;
dirs = N@Flatten[
Table[{Cos[\[Theta]] Cos[\[Phi]], Sin[\[Theta]] Cos[\[Phi]],
Sin[\[Phi]]}, {\[Theta], 2 \[Pi]/(10 nqs), 2 \[Pi],
2 \[Pi]/nqs}, {\[Phi], -\[Pi], \[Pi], 2 \[Pi]/nqs}], 1];

Graphics3D[Map[Arrow[{{0, 0, 0}, #}] &, dirs]]


3. Rotate the data for each direction vector, a ∈ A. (Here we just generate the rotation matrices.)

In[672]:= rmats = RotationMatrix[{{1, 0, 0}, #}] & /@ dirs;

Out[673]= 420

4. Find the q-quantile of the z-coordinates of the data rotated at direction a. Denote that point with Z[a,q].

In[674]:= qs = {0.95};
qDirPoints =
Flatten[Map[Function[{m}, Quantile[(m.Transpose[data])[[3]], qs]], rmats]];

5. Find the points z[a,q] corresponding to Z[a,q] in the original coordinate system.
5.1. In this implementation this step is redundant, see the next step.

6. Find the quantile planes intersections
6.1. Instead of doing this step we simply use ImplicitRegion.

In[676]:= qRegion = ImplicitRegion[
MapThread[(#1.{x, y, z})[[3]] <= #2 &, {rmats, qDirPoints}], {x, y, z}];

In[677]:= Shallow[qRegion]

Out[677]//Shallow= ImplicitRegion[
LessEqual[<>] && LessEqual[<>] && LessEqual[<>] &&
LessEqual[<>] && LessEqual[<>] && LessEqual[<>] &&
LessEqual[<>] && LessEqual[<>] && LessEqual[<>] &&
LessEqual[<>] && <>, {x, y, z}]

Wrapping it in a function

Here is a definition of a function that wraps all of the algorithms steps in the previous section.

QuantileEnvelopeRegion[points_?MatrixQ, quantile_?NumberQ, numberOfDirections_Integer] :=
Block[{nd = numberOfDirections, dirs, rmats, qDirPoints, qRegion},
dirs =
Table[{Cos[\[Theta]] Cos[\[Phi]], Sin[\[Theta]] Cos[\[Phi]],
Sin[\[Phi]]}, {\[Theta], 2 \[Pi]/(10 nd), 2 \[Pi],
2 \[Pi]/nd}, {\[Phi], -\[Pi], \[Pi], 2 \[Pi]/nd}], 1];
rmats = RotationMatrix[{{1, 0, 0}, #}] & /@ dirs;
qDirPoints =
Function[{m}, Quantile[(m.Transpose[points])[[3]], quantile]],
qRegion =
MapThread[(#1.{x, y, z})[[3]] <= #2 &, {rmats, qDirPoints}], {x, y,
] /; Dimensions[points][[2]] == 3 && 0 < quantile <= 1;


From the implicit region specification we can generate an approximation of the boundary surface using the built-in function BoundaryDiscretizeRegion.

qRegion = QuantileEnvelopeRegion[data, 0.95, 20];
qRegion2 = QuantileEnvelopeRegion[data, 0.8, 20];

qSurface = BoundaryDiscretizeRegion[qRegion];
qSurface2 = BoundaryDiscretizeRegion[qRegion2];

Grid[{{qSurface, qSurface2}}]

Now we can visualize the quantile surface together with the data points:

Block[{c = 3, opts, pntsgr},
opts = {PlotRange -> {{-c, c}, {0, c}, {-c, c}}, Boxed -> True, Axes -> True,
ImageSize -> {Automatic, 500}};
pntsgr = Graphics3D[Point[data]];
Grid[{{Show[{qSurface, pntsgr}, opts], Show[{qSurface2, pntsgr}, opts]}}]



Now we can answer the question how many data points are inside the quantile directions regions. Again, this is easy with the built-in region functionalities of Mathematica version 10.

In[656]:= inPoints = Select[data, # \[Element] qRegion &];
Length[inPoints]/Length[data] // N

Out[657]= 0.5974

In[700]:= inPoints = Select[data, # \[Element] qRegion2 &];
Length[inPoints]/Length[data] // N

Out[701]= 0.1705

(In these code lines I kept “[\Element]” instead of replacing it with “∈” in order the lines to be copy and pasted. )
As we can see the algorithm would produce a region Subscript[R, q] which contains a much smaller number of points than the requested quantile, q. This property of the algorithm is discussed in [2]. (In the two dimensional space this property is less pronounced.)

Obviously, though, we would expect the bias of the algorithm to be monotonic with respect to the quantiles requested. Given a set of points P and quantiles u and v, 0<u<v<=1, for the respectively produced by the algorithm regions Ru and Rv we have |P ∩ Ru| < |P ∩ Rv| .


[1] Anton Antonov, Directional quantile envelopes, “Mathematica for prediction algorithms” blog at, 11/3/2014 .

[2] Linglong Kong, Ivan Mizera, “Quantile tomography: using quantiles with multivariate data”, 2013, arXiv:0805.0056v2 [stat.ME] URL: .

Directional quantile envelopes


In this blog post I am going to discuss the computation of the so called directional quantile envelopes; see [5] for definitions, theorems, and concrete examples. The Mathematica package QuantileRegression.m, [1] has an experimental implementation, the function QuantileEnvelope.

This type of implementation investigation is a natural extension of the quantile regression implementations and utilization I have done before. (See the package [1] and the related guide [2]; blog posts [3,4].) I was looking for a 2D quantile regression technique that can be used as a tool for finding 2D outliers. From the exposition below it should be obvious that the technique can be generalized in higher dimensions.

The idea of directional quantile envelopes is conceptually simple (and straightforward to come up with). The calculation is also relatively simple: over a set of uniformly distributed directions we find the lines that separate the data according a quantile parameter q, 0<q<1, and with those lines we approximate the enveloping curve for data that corresponds to q.

The algorithm

Here is the full algorithm. (The figure below can be used as a visual aid).

Parameters: data as a matrix of two columns, a quantile parameter q, 0<q<1.
1. Standardize the data.
1.1. Choose a center of the data (e.g. means of the coordinates) and a scale (e.g. inter-quartile distances).
1.2. Transpose the data to the center and scale the obtained coordinates.
2. Create a set if uniform angles, A:={i*2*Pi/k}, 0<=i<=k-1 .
2.1. Denote with a[i]:=i*2*Pi/k, i∈[0,k-1].
3. Rotate the data for each angle, a[i]∈A.
4. Find the q-quantile of the y-coordinates of the data rotated at angle a. Denote that point with r[a,q].
5. Find the points y[a,q] corresponding to r[a,q] in the original coordinate system.
6. Find the quantile lines intersections
6.1. For each angle a∈A we denote with n[a] the vector corresponding to a.
6.2. For each a∈A we have a line l[a,q] with normal n[a] and passing through y[a,q].
6.3. Find the intersection points P[q] of the lines l[a,q].
6.3.1. Each point p[j]∈P[q], j∈[1,k] is an intersection of lines at consecutive angles.
6.3.2. In other words p[j]=l[a[j-1],q] ∩ l[a[j],q], j∈[1,k].
7. Apply the reverse of the transformation done at step 3 to P[q].
8. Connecting the consecutive points of P[q] gives the directional quantile envelope approximation.

Click this thumbnail for an image with better math notation:

Algorithm visual aid

Here is an example of an application of the described algorithm. The points in blue are the data points, the points in red are the points y[a,q], the red lines are the lines l[a,q], the points in black are the points P.


Remark 1: The function QuantileEnvelope of [1] follows the described algorithm. The rotations are done using matrix multiplication; the function Reduce is used to find the intersections between the lines.

Remark 2: A possible alternative algorithm is to use “the usual” quantile regression with polar coordinates. (The article [5] refers to and briefly discusses such an approach.)

Example data

In order to demonstrate the function QuantileEnvelope let us define a couple of data sets.

The first data set is generated with the function SkewNormalDistribution, so the data set name is sndData.

npoints = 12000;
data = {RandomReal[{0, 2 \[Pi]}, npoints],
RandomVariate[SkewNormalDistribution[0.6, 0.3, 4], npoints]};
data = MapThread[#2*{Cos[#1], Sin[#1]} &, data];
rmat = RotationMatrix[-\[Pi]/2.5].DiagonalMatrix[{2, 1}];
data = Transpose[rmat.Transpose[data]];
data = TranslationTransform[{-Norm[Abs[#]/3], 0}][#] & /@ data;
sndData = Standardize[data];

Here is a plot of sndData with grid lines derived from the quantiles of x and y coordinates of the data set.

ListPlot[sndData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed",
GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[sndData]],
ImageSize -> 700]


The second uses the function PoissonDistribution, so the data set name is pdData.

npoints = 12000;
data = RandomReal[NormalDistribution[12, 5.6], npoints];
data = Transpose[{data,
data + RandomVariate[PoissonDistribution[12], npoints] +
RandomReal[{-0.2, 0.2}, npoints]}];
data = Select[data, #[[2]] > 0 &];
data[[All, 2]] = Log[data[[All, 2]]];
pdData = data;

Here is a plot of pdData with grid lines derived from the quantiles of x and y coordinates of the data set.

ListPlot[pdData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed",
GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[pdData]],
ImageSize -> 700]


Examples of application (with QuantileEnvelope)

Let us demonstrate the function QuantileEnvelope. The function arguments are a data matrix, a quantile specification, and number of directions. The quantile specification can a number or a list of numbers. The function returns the intersection points of the quantile lines at different directions.

Here is an application to the dataset sndData over 8 different quantiles over 60 different uniformly spread directions.

In[630]:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, .99} ;
AbsoluteTiming[qsPoints = QuantileEnvelope[sndData, qs, 60];]
Out[631]= {0.150104, Null}

The result is a list of 8 lists of 60 points.

In[632]:= Dimensions[qsPoints]
Out[632]= {8, 60, 2}

Let us plot the data and the direction quantile envelopes.

Block[{data = sndData},
Show[{ListPlot[data, AspectRatio -> 1, PlotTheme -> {"Detailed"},
GridLines ->
Map[Quantile[#, Range[0, 1, 1/(20 - 1)]] &, Transpose[data]],
PlotLegends ->
Blend[{Red, Orange},
Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]] & /@ qs, qs]],
Graphics[{PointSize[0.005], Thickness[0.0025],
MapThread[{Blend[{Red, Orange},
Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]],
Tooltip[Line[Append[#2, #2[[1]]]], #1], Point[#2]} &, {qs,
}, ImageSize -> 700]]


Let us similar calculations for the dataset pdData.

In[647]:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, 0.99} ;
AbsoluteTiming[qsPoints = QuantileEnvelope[pdData, qs, 60];]
Out[648]= {0.154227, Null}

In[649]:= Dimensions[qsPoints]
Out[649]= {8, 60, 2}

Here is the plot of the dataset pdData and directional quantile envelopes.

Some more quantile curves

Here are couple of more plots of the quantile lines at different directions with a larger number of directions.

Lines for quantile 0.95 in 40 directions:

Lines for quantiles 0.95 and 0.8 in 40 directions:


[1] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,, package QuantileRegression.m, (2013).

[2] Anton Antonov, Quantile regression through linear programming, usage guide at GitHub,, in the directory “Documentation”, (2013).

[3] Anton Antonov, Quantile regression through linear programming, “Mathematica for prediction algorithms” blog at, 12/16/2013.

[4] Anton Antonov, Quantile regression with B-splines, “Mathematica for prediction algorithms” blog at, 1/1/2014.

[5] Linglong Kong, Ivan Mizera, “Quantile tomography: using quantiles with multivariate data”, 2013, arXiv:0805.0056v2 [stat.ME] URL: .

Removal of sub-trees in tries

I recently updated the package TriesWithFrequencies.m with two new functions TrieRemoval and TrieNodeFrequencies. The function TrieNodeFrequencies reverses the outcomes of TrieNodeProbabilities; it was made to facilitate the implementation and usage of TrieRemoval.

The removal of a sub-tree from a trie is useful when we want to alternate the learning stored in the trie.

The new functions are illustrated with examples below. (These examples reuse and extend the ones in the blog post “Tries with frequencies for data mining”.)
Consider the trie made with the list of words

{"war", "ward", "warden", "work", "car", "cars", "carbs"}



The function TrieNodeFrequencies can be used to reverse the effect of TrieNodeProbabilities. Its first argument is a trie, the second one is a scaling parameter (which is 1 by default). Here is an example:

TrieForm[TrieNodeFrequencies[TrieNodeProbabilities[mytrie], 14]]

The function TrieRemove can be used to remove sub-parts of a trie that correspond to a specified “word”. Here is an example using a trie with frequencies:

TrieForm[TrieRemove[mytrie, "car"]]


Here is an example using a trie with probabilities:

TrieForm[TrieRemove[TrieNodeProbabilities[mytrie], "car"]]


TrieRemove has two options “HowManyTimes” and “FrequencyValues” with default values Automatic. The option “HowManyTimes” takes a number or Automatic and it is still under development (its functionality is partially correct). The option “FrequencyValues” takes True | False | Automatic and it is used for the interpretation of the sub-tree removal. If the values are frequencies the removal process is more straightforward. If the values are probabilities the removal process has to take into account that the values on the same level of the sub-tree are conditional probabilities that sum to 1. TrieRemove has two different implementations one for tries with frequencies and one for tries with probabilities. The option setting "FrequencyValues"->True|False is used to redirect to the correct implementation. When "FrequencyValues"->Automatic is given a simple heuristic is used to determine is the trie argument a trie with frequencies or a trie with probabilities.