This document discusses concrete algorithms for two different approaches of generation of mandala images, [1]: direct construction with graphics primitives, and use of machine learning algorithms.
In the experiments described in this document better results were obtained with the direct algorithms. The direct algorithms were made for the Mathematica StackExchange question "Code that generates a mandala", [3].
The main goals of this document are:
to provide an illustrative example of comparing dimension reduction methods,
to give a set-up for further discussions and investigations on mandala creation with machine learning algorithms.
Two direct construction algorithms are given: one uses "seed" segment rotations, the other superimposing of layers of different types. The following plots show the order in which different mandala parts are created with each of the algorithms.
In this document we use several algorithms for dimension reduction applied to collections of images following the procedure described in [4,5]. We are going to show that with Non-Negative Matrix Factorization (NNMF) we can use mandalas made with the seed segment rotation algorithm to extract layer types and superimpose them to make colored mandalas. Using the same approach with Singular Value Decomposition (SVD) or Independent Component Analysis (ICA) does not produce good layers and the superimposition produces more "watered-down", less diverse mandalas.
From a more general perspective this document compares the statistical approach of "trying to see without looking" with the "direct simulation" approach. Another perspective is the creation of "design spaces"; see [6].
The idea of using machine learning algorithms is appealing because there is no need to make the mental effort of understanding, discerning, approximating, and programming the principles of mandala creation. We can "just" use a large collection of mandala images and generate new ones using the "internal knowledge" data of machine learning algorithms. For example, a Neural network system like Deep Dream, [2], might be made to dream of mandalas.
In this section we present two different algorithms for generating mandalas. The first sees a mandala as being generated by rotation of a "seed" segment. The second sees a mandala as being generated by different component layers. For other approaches see [3].
The request of [3] is for generation of mandalas for coloring by hand. That is why the mandala generation algorithms are in the grayscale space. Coloring the generated mandala images is a secondary task.
One way to come up with mandalas is to generate a segment and then by appropriate number of rotations to produce a mandala.
Here is a function and an example of random segment (seed) generation:
Clear[MakeSeedSegment]
MakeSeedSegment[radius_, angle_, n_Integer: 10,
connectingFunc_: Polygon, keepGridPoints_: False] :=
Block[{t},
t = Table[
Line[{radius*r*{Cos[angle], Sin[angle]}, {radius*r, 0}}], {r, 0, 1, 1/n}];
Join[If[TrueQ[keepGridPoints], t, {}], {GrayLevel[0.25],
connectingFunc@RandomSample[Flatten[t /. Line[{x_, y_}] :> {x, y}, 1]]}]
];
seed = MakeSeedSegment[10, Pi/12, 10];
Graphics[seed, Frame -> True]
This function can make a seed segment symmetric:
Clear[MakeSymmetric]
MakeSymmetric[seed_] := {seed,
GeometricTransformation[seed, ReflectionTransform[{0, 1}]]};
seed = MakeSymmetric[seed];
Graphics[seed, Frame -> True]
Using a seed we can generate mandalas with different specification signatures:
Clear[MakeMandala]
MakeMandala[opts : OptionsPattern[]] :=
MakeMandala[
MakeSymmetric[
MakeSeedSegment[20, Pi/12, 12,
RandomChoice[{Line, Polygon, BezierCurve,
FilledCurve[BezierCurve[#]] &}], False]], Pi/6, opts];
MakeMandala[seed_, angle_?NumericQ, opts : OptionsPattern[]] :=
Graphics[GeometricTransformation[seed,
Table[RotationMatrix[a], {a, 0, 2 Pi - angle, angle}]], opts];
This code randomly selects symmetricity and seed generation parameters (number of concentric circles, angles):
SeedRandom[6567]
n = 12;
Multicolumn@
MapThread[
Image@If[#1,
MakeMandala[MakeSeedSegment[10, #2, #3], #2],
MakeMandala[
MakeSymmetric[MakeSeedSegment[10, #2, #3, #4, False]], 2 #2]
] &, {RandomChoice[{False, True}, n],
RandomChoice[{Pi/7, Pi/8, Pi/6}, n],
RandomInteger[{8, 14}, n],
RandomChoice[{Line, Polygon, BezierCurve,
FilledCurve[BezierCurve[#]] &}, n]}]
Here is a more concise way to generate symmetric segment mandalas:
Multicolumn[Table[Image@MakeMandala[], {12}], 5]
Note that with this approach the programming of the mandala coloring is not that trivial — weighted blending of colorized mandalas is the easiest thing to do. (Shown below.)
This approach was given by Simon Woods in [3].
"For this one I’ve defined three types of layer, a flower, a simple circle and a ring of small circles. You could add more for greater variety."
The coloring approach with image blending given below did not work well for this algorithm, so I modified the original code in order to produce colored mandalas.
ClearAll[LayerFlower, LayerDisk, LayerSpots, MandalaByLayers]
LayerFlower[n_, a_, r_, colorSchemeInd_Integer] :=
Module[{b = RandomChoice[{-1/(2 n), 0}]}, {If[
colorSchemeInd == 0, White,
RandomChoice[ColorData[colorSchemeInd, "ColorList"]]],
Cases[ParametricPlot[
r (a + Cos[n t])/(a + 1) {Cos[t + b Sin[2 n t]], Sin[t + b Sin[2 n t]]}, {t, 0, 2 Pi}],
l_Line :> FilledCurve[l], -1]}];
LayerDisk[_, _, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White,
RandomChoice[ColorData[colorSchemeInd, "ColorList"]]],
Disk[{0, 0}, r]};
LayerSpots[n_, a_, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White,
RandomChoice[ColorData[colorSchemeInd, "ColorList"]]],
Translate[Disk[{0, 0}, r a/(4 n)], r CirclePoints[n]]};
MandalaByLayers[n_, m_, coloring : (False | True) : False, opts : OptionsPattern[]] :=
Graphics[{EdgeForm[Black], White,
Table[RandomChoice[{3, 2, 1} -> {LayerFlower, LayerDisk, LayerSpots}][n, RandomReal[{3, 5}], i,
If[coloring, RandomInteger[{1, 17}], 0]]~Rotate~(Pi i/n), {i, m, 1, -1}]}, opts];
Here are generated black-and-white mandalas.
SeedRandom[6567]
ImageCollage[Table[Image@MandalaByLayers[16, 20], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]
Here are some colored mandalas. (Which make me think more of Viking and Native American art than mandalas.)
ImageCollage[Table[Image@MandalaByLayers[16, 20, True], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]
iSize = 400;
SeedRandom[6567]
AbsoluteTiming[
mandalaImages =
Table[Image[
MakeMandala[
MakeSymmetric@
MakeSeedSegment[10, Pi/12, 12, RandomChoice[{Polygon, FilledCurve[BezierCurve[#]] &}]], Pi/6],
ImageSize -> {iSize, iSize}, ColorSpace -> "Grayscale"], {300}];
]
(* {39.31, Null} *)
ImageCollage[ColorNegate /@ RandomSample[mandalaImages, 12], Background -> White, ImagePadding -> 3, ImageSize -> 400]
See the section "Using World Wide Web images".
The most interesting results are obtained with the image blending procedure coded below over mandala images generated with the seed segment rotation algorithm.
SeedRandom[3488]
directBlendingImages = Table[
RemoveBackground@
ImageAdjust[
Blend[Colorize[#,
ColorFunction ->
RandomChoice[{"IslandColors", "FruitPunchColors",
"AvocadoColors", "Rainbow"}]] & /@
RandomChoice[mandalaImages, 4], RandomReal[1, 4]]], {36}];
ImageCollage[directBlendingImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]
In this section we are going to apply the dimension reduction algorithms Singular Value Decomposition (SVD), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF) to a linear vector space representation (a matrix) of an image dataset. In the next section we use the bases generated by those algorithms to make mandala images.
We are going to use the packages [7,8] for ICA and NNMF respectively.
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/IndependentComponentAnalysis.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/NonNegativeMatrixFactorization.m"]
The linear vector space representation of the images is simple — each image is flattened to a vector (row-wise), and the image vectors are put into a matrix.
mandalaMat = Flatten@*ImageData@*ColorNegate /@ mandalaImages;
Dimensions[mandalaMat]
(* {300, 160000} *)
The following code re-factors the images matrix with SVD, ICA, and NNMF and extracts the basis images.
AbsoluteTiming[
svdRes = SingularValueDecomposition[mandalaMat, 20];
]
(* {5.1123, Null} *)
svdBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose@svdRes[[3]]];
AbsoluteTiming[
icaRes =
IndependentComponentAnalysis[Transpose[mandalaMat], 20,
PrecisionGoal -> 4, "MaxSteps" -> 100];
]
(* {23.41, Null} *)
icaBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose[icaRes[[1]]]];
SeedRandom[452992]
AbsoluteTiming[
nnmfRes =
GDCLS[mandalaMat, 20, PrecisionGoal -> 4,
"MaxSteps" -> 20, "RegularizationParameter" -> 0.1];
]
(* {233.209, Null} *)
nnmfBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, nnmfRes[[2]]];
Let us visualize the bases derived with the matrix factorization methods.
Grid[{{"SVD", "ICA", "NNMF"},
Map[ImageCollage[#, Automatic, {400, 500},
Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &,
{svdBasisImages, icaBasisImages, nnmfBasisImages}]
}, Dividers -> All]
Here are some observations for the bases.
The SVD and ICA bases are structured similarly. That is because ICA and SVD are both based on orthogonality — ICA factorization uses an orthogonality criteria based on Gaussian noise properties (which is more relaxed than SVD’s standard orthogonality criteria.)
As expected, the NNMF basis images have black background because of the enforced non-negativity. (Black corresponds to 0, white to 1.)
Compared to the SVD and ICA bases the images of the NNMF basis are structured in a radial manner. This can be demonstrated using image binarization.
Grid[{{"SVD", "ICA", "NNMF"}, Map[ImageCollage[Binarize[#, 0.5] & /@ #, Automatic, {400, 500}, Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &, {svdBasisImages, icaBasisImages, nnmfBasisImages}] }, Dividers -> All]
We can see that binarizing of the NNMF basis images shows them as mandala layers. In other words, using NNMF we can convert the mandalas of the seed segment rotation algorithm into mandalas generated by an algorithm that superimposes layers of different types.
In this section we just show different blending images using the SVD, ICA, and NNMF bases.
ClearAll[MandalaImageBlending]
Options[MandalaImageBlending] = {"BaseImage" -> {}, "BaseImageWeight" -> Automatic, "PostBlendingFunction" -> (RemoveBackground@*ImageAdjust)};
MandalaImageBlending[basisImages_, nSample_Integer: 4, n_Integer: 12, opts : OptionsPattern[]] :=
Block[{baseImage, baseImageWeight, postBlendingFunc, sImgs, sImgWeights},
baseImage = OptionValue["BaseImage"];
baseImageWeight = OptionValue["BaseImageWeight"];
postBlendingFunc = OptionValue["PostBlendingFunction"];
Table[(
sImgs =
Flatten@Join[{baseImage}, RandomSample[basisImages, nSample]];
If[NumericQ[baseImageWeight] && ImageQ[baseImage],
sImgWeights =
Join[{baseImageWeight}, RandomReal[1, Length[sImgs] - 1]],
sImgWeights = RandomReal[1, Length[sImgs]]
];
postBlendingFunc@
Blend[Colorize[#,
DeleteCases[{opts}, ("BaseImage" -> _) | ("BaseImageWeight" -> _) | ("PostBlendingFunction" -> _)],
ColorFunction ->
RandomChoice[{"IslandColors", "FruitPunchColors",
"AvocadoColors", "Rainbow"}]] & /@ sImgs,
sImgWeights]), {n}]
];
SeedRandom[17643]
svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]
SeedRandom[17643]
svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24, "BaseImage" -> First[svdBasisImages], "BaseImageWeight" -> 0.5];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]
SeedRandom[17643]
icaBlendedImages = MandalaImageBlending[Rest[icaBasisImages], 4, 36, "BaseImage" -> First[icaBasisImages], "BaseImageWeight" -> Automatic];
ImageCollage[icaBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]
SeedRandom[17643]
nnmfBlendedImages = MandalaImageBlending[nnmfBasisImages, 4, 36];
ImageCollage[nnmfBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]
A natural question to ask is:
What would be the outcomes of the above procedures to mandala images found in the World Wide Web (WWW) ?
Those WWW images are most likely man made or curated.
The short answer is that the results are not that good. Better results might be obtained using a larger set of WWW images (than just 100 in the experiment results shown below.)
Here is a sample from the WWW mandala images:
Here are the results obtained with NNMF basis:
My other motivation for writing this document is to set up a basis for further investigations and discussions on the following topics.
Utilization of Neural Network algorithms to mandala creation.
Utilization of Cellular Automata to mandala generation.
Investigate mandala morphing and animations.
Making a domain specific language of specifications for mandala creation and modification.
The idea of using machine learning algorithms for mandala image generation was further supported by an image classifier that recognizes fairly well (suitably normalized) mandala images obtained in different ways:
[1] Wikipedia entry: Mandala, https://en.wikipedia.org/wiki/Mandala .
[2] Wikipedia entry: DeepDream, https://en.wikipedia.org/wiki/DeepDream .
[3] "Code that generates a mandala", Mathematica StackExchange, http://mathematica.stackexchange.com/q/136974 .
[4] Anton Antonov, "Comparison of PCA and NNMF over image de-noising", (2016), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2016/05/07/comparison-of-pca-and-nnmf-over-image-de-noising/ .
[5] Anton Antonov, "Handwritten digits recognition by matrix factorization", (2016), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2016/11/12/handwritten-digits-recognition-by-matrix-factorization/ .
[6] Chris Carlson, "Social Exploration of Design Spaces: A Proposal", (2016), Wolfram Technology Conference 2016. URL: http://wac .36f4.edgecastcdn.net/0036F4/pub/www.wolfram.com/technology-conference/2016/SocialExplorationOfDesignSpaces.nb , YouTube: https://www.youtube.com/watch?v=YK2523nfcms .
[7] Anton Antonov, Independent Component Analysis Mathematica package, (2016), source code at MathematicaForPrediction at GitHub, package IndependentComponentAnalysis.m .
[8] Anton Antonov, Implementation of the Non-Negative Matrix Factorization algorithm in Mathematica, (2013), source code at MathematicaForPrediction at GitHub, package NonNegativeMatrixFactorization.m.
This blog post describes the installation and use in Mathematica of Tries with frequencies [1] implemented in Java [2] through a corresponding Mathematica package [3].
Prefix tree or Trie, [6], is a tree data structure that stores a set of "words" that consist of "characters" — each element can be seen as a key to itself. The article [1] and packages [2,3,4] extend that data structure to have additional data (frequencies) associated with each key.
The packages [2,3] work with lists of strings only. The package [4] can work with more general data but it is much slower.
The main motivation to create the package [3] was to bring the fast Trie functions implementations of [2] into Mathematica in order to prototype, implement, and experiment with different text processing algorithms. (Like, inductive grammar parsers generation and entity name recognition.) The speed of combining [2] and [3] is evaluated in the section "Performance tests" below.
This following directory path has to have the jar file "TriesWithFrequencies.jar".
$JavaTriesWithFrequenciesPath =
"/Users/antonov/MathFiles/MathematicaForPrediction/Java/TriesWithFrequencies";
FileExistsQ[
FileNameJoin[{$JavaTriesWithFrequenciesPath, "TriesWithFrequencies.jar"}]]
(* True *)
For more details see the explanations in the README file in the GitHub directory of [2].
The following directory is expected to have the Mathematica package [3].
dirName = "/Users/antonov/MathFiles/MathematicaForPrediction";
FileExistsQ[FileNameJoin[{dirName, "JavaTriesWithFrequencies.m"}]]
(* True *)
AppendTo[$Path, dirName];
Needs["JavaTriesWithFrequencies`"]
This commands installs Java (via JLink`) and loads the necessary Java libraries.
JavaTrieInstall[$JavaTriesWithFrequenciesPath]
For brevity the basic examples are not included in this blog post. Here is album of images that shows the "JavaTrie.*"
commands with their effects:
More detailed explanations can be found in the Markdown document, [7]:
Next, we are going to look into performance evaluation examples (also given in [7].)
Assume we want find the words of "Hamlet" that are not in the book "Origin of Species". This section shows that the Java trie creation and query times for this task are quite small.
The following code reads the words in the texts. We get 33000 words from "Hamlet" and 151000 words from "Origin of Species".
hWords =
Block[{words},
words =
StringSplit[
ExampleData[{"Text", "Hamlet"}], {Whitespace,
PunctuationCharacter}];
words = Select[ToLowerCase[words], StringLength[#] > 0 &]
];
Length[hWords]
(* 32832 *)
osWords =
Block[{words},
words =
StringSplit[
ExampleData[{"Text", "OriginOfSpecies"}], {Whitespace,
PunctuationCharacter}];
words = Select[ToLowerCase[words], StringLength[#] > 0 &]
];
Length[osWords]
(* 151205 *)
First we create trie with "Origin of species" words:
AbsoluteTiming[
jOStr = JavaTrieCreateBySplit[osWords];
]
(* {0.682531, Null} *)
Sanity check — the "Origin of species" words are in the trie:
AbsoluteTiming[
And @@ JavaObjectToExpression[
JavaTrieContains[jOStr, Characters /@ osWords]]
]
(* {1.32224, True} *)
Membership of "Hamlet" words into "Origin of Species":
AbsoluteTiming[
res = JavaObjectToExpression[
JavaTrieContains[jOStr, Characters /@ hWords]];
]
(* {0.265307, Null} *)
Tallies of belonging:
Tally[res]
(* {{True, 24924}, {False, 7908}} *)
Sample of words from "Hamlet" that do not belong to "Origin of Species":
RandomSample[Pick[hWords, Not /@ res], 30]
(* {"rosencrantz", "your", "mar", "airy", "rub", "honesty", \
"ambassadors", "oph", "returns", "pale", "virtue", "laertes", \
"villain", "ham", "earnest", "trail", "unhand", "quit", "your", \
"your", "fishmonger", "groaning", "your", "wake", "thou", "liest", \
"polonius", "upshot", "drowned", "grosser"} *)
Common words sample:
RandomSample[Pick[hWords, res], 30]
(* {"as", "indeed", "it", "with", "wild", "will", "to", "good", "so", \
"dirt", "the", "come", "not", "or", "but", "the", "why", "my", "to", \
"he", "and", "you", "it", "to", "potent", "said", "the", "are", \
"question", "soft"} *)
The node counts statistics calculation is fast:
AbsoluteTiming[
JavaTrieNodeCounts[jOStr]
]
(* {0.002344, <|"total" -> 20723, "internal" -> 15484, "leaves" -> 5239|>} *)
The node counts statistics computation after shrinking is comparably fast :
AbsoluteTiming[
JavaTrieNodeCounts[JavaTrieShrink[jOStr]]
]
(* {0.00539, <|"total" -> 8918, "internal" -> 3679, "leaves" -> 5239|>} *)
The conversion of a large trie to JSON and computing statistics over the obtained tree is reasonably fast:
AbsoluteTiming[
res = JavaTrieToJSON[jOStr];
]
(* {0.557221, Null} *)
AbsoluteTiming[
Quantile[
Cases[res, ("value" -> v_) :> v, \[Infinity]],
Range[0, 1, 0.1]]
]
(* {0.019644, {1., 1., 1., 1., 2., 3., 5., 9., 17., 42., 151205.}} *)
Get all words from a dictionary:
allWords = DictionaryLookup["*"];
allWords // Length
(* 92518 *)
Trie creation and shrinking:
AbsoluteTiming[
jDTrie = JavaTrieCreateBySplit[allWords];
jDShTrie = JavaTrieShrink[jDTrie];
]
(* {0.30508, Null} *)
JSON form extraction:
AbsoluteTiming[
jsonRes = JavaTrieToJSON[jDShTrie];
]
(* {3.85955, Null} *)
Here are the node statistics of the original and shrunk tries:
Find the infixes that have more than three characters and appear more than 10 times:
Multicolumn[#, 4] &@
Select[SortBy[
Tally[Cases[
jsonRes, ("key" -> v_) :> v, Infinity]], -#[[-1]] &], StringLength[#[[1]]] > 3 && #[[2]] > 10 &]
Many of example shown in this document have corresponding tests in the file JavaTriesWithFrequencies-Unit-Tests.wlt hosted at GitHub.
tr = TestReport[
dirName <> "/UnitTests/JavaTriesWithFrequencies-Unit-Tests.wlt"]
[1] Anton Antonov, "Tries with frequencies for dataÂ mining", (2013), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2013/12/06/tries-with-frequencies-for-data-mining/ .
[2] Anton Antonov, Tries with frequencies in Java, (2017), source code at MathematicaForPrediction at GitHub, project Java/TriesWithFrequencies.
[3] Anton Antonov, Java tries with frequencies Mathematica package, (2017), source code at MathematicaForPrediction at GitHub, package JavaTriesWithFrequencies.m .
[4] Anton Antonov, Tries with frequencies Mathematica package, (2013), source code at MathematicaForPrediction at GitHub, package TriesWithFrequencies.m .
[5] Anton Antonov, Java tries with frequencies Mathematica unit tests, (2017), source code at MathematicaForPrediction at GitHub, unit tests file JavaTriesWithFrequencies-Unit-Tests.wlt .
[6] Wikipedia, Trie, https://en.wikipedia.org/wiki/Trie .
[7] Anton Antonov, "Tries with frequencies in Java", (2017), MathematicaForPrediction at GitHub.
This post is to proclaim the MathematicaVsR at GitHub project “Text analysis of Trump tweets” in which we compare Mathematica and R over text analyses of Twitter messages made by Donald Trump (and his staff) before the USA president elections in 2016.
The project follows and extends the exposition and analysis of the R-based blog post "Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half" by David Robinson at VarianceExplained.org; see [1].
The blog post [1] links to several sources that claim that during the election campaign Donald Trump tweeted from his Android phone and his campaign staff tweeted from an iPhone. The blog post [1] examines this hypothesis in a quantitative way (using various R packages.)
The hypothesis in question is well summarized with the tweet:
Every non-hyperbolic tweet is from iPhone (his staff).
Every hyperbolic tweet is from Android (from him). pic.twitter.com/GWr6D8h5ed
— Todd Vaziri (@tvaziri) August 6, 2016
This conjecture is fairly well supported by the following mosaic plots, [2]:
We can see the that Twitter messages from iPhone are much more likely to be neutral, and the ones from Android are much more polarized. As Christian Rudder (one of the founders of OkCupid, a dating website) explains in the chapter "Death by a Thousand Mehs" of the book "Dataclysm", [3], having a polarizing image (online persona) is as a very good strategy to engage online audience:
[…] And the effect isn’t small-being highly polarizing will in fact get you about 70 percent more messages. That means variance allows you to effectively jump several "leagues" up in the dating pecking order – […]
(The mosaic plots above were made for the Mathematica-part of this project. Mosaic plots and weekday tags are not used in [1].)
The R part consists of :
the blog post [1], and
The Mathematica-part of this project does not follow closely the blog post [1]. After the ingestion of the data provided in [1], the Mathematica-part applies alternative algorithms to support and extend the analysis in [1].
The sections in the R-part notebook correspond to some — not all — of the sections in the Mathematica-part.
The following list of steps is for the Mathematica-part.
That can be done in Mathematica too using the built-in function ServiceConnect
, but that is not necessary since [1] provides a link to the ingested data used [1]:
load(url("http://varianceexplained.org/files/trump_tweets_df.rda"))
Which leads to the ingesting of an R data frame in the Mathematica-part using RLink.
Adding tags
Using the message time-stamps each message is associated with time tags corresponding to the creation time month, hour, weekday, etc.
Here is summary of the data at this stage:
Time series and time related distributions
Here is a Mathematica made plot for the same statistic computed in [1] that shows differences in tweet posting behavior:
Classification into sentiments and Facebook topics
In [1] the results of this step are derived in several stages.
Here is a mosaic plot for conditional probabilities of devices, topics, and sentiments:
Device-word association rules
In the Mathematica-part these associations rules are not needed for the sentiment analysis (because of the built-in classifiers.)
The association rule mining is done mostly to support and extend the text analysis in [1] and, of course, for comparison purposes.
Here is an example of derived association rules together with their most important measures:
In [1] the sentiments are derived from computed device-word associations, so in [1] the order of steps is 1-2-3-5-4. In Mathematica we do not need the steps 3 and 5 in order to get the sentiments in the 4th step.
Using Mathematica for sentiment analysis is much more direct because of the built-in classifiers.
The R-based blog post [1] uses heavily the "pipeline" operator %>%
which is kind of a recent addition to R (and it is both fashionable and convenient to use it.) In Mathematica the related operators are Postfix
(//
), Prefix
(@
), Infix
(~~
), Composition
(@*
), and RightComposition
(/*
).
Making the time series plots with the R package "ggplot2" requires making special data frames. I am inclined to think that the Mathematica plotting of time series is more direct, but for this task the data wrangling codes in Mathematica and R are fairly comparable.
Generally speaking, the R package "arules" — used in this project for Associations rule learning — is somewhat awkward to use:
requires the use of factors.
The Apriori implementation in “arules” is much faster than the one in “AprioriAlgorithm.m” — “arules” uses a more efficient algorithm implemented in C.
[1] David Robinson, "Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half", (2016), VarianceExplained.org.
[2] Anton Antonov, "Mosaic plots for data visualization", (2014), MathematicaForPrediction at WordPress.
[3] Christian Rudder, Dataclysm, Crown, 2014. ASIN: B00J1IQUX8 .
The Pareto principle is an interesting law that manifests in many contexts. It is also known as "Pareto law", "the law of significant few", "the 80-20 rule".
For example:
"10% of all lakes contain 90% of all lake water."
For extensive discussion and studied examples see the Wikipedia entry "Pareto principle", [4].
It is a good idea to see for which parts of the analyzed data the Pareto principle manifests. Testing for the Pareto principle is usually simple. For example, assume that we have the GDP of all countries:
countries = CountryData["Countries"];
gdps = {CountryData[#, "Name"], CountryData[#, "GDP"]} & /@ countries;
gdps = DeleteCases[gdps, {_, _Missing}] /. Quantity[x_, _] :> x;
Grid[{RecordsSummary[gdps, {"country", "GDP"}]}, Alignment -> Top, Dividers -> All]
In order to test for the manifestation of the Pareto principle we have to (i) sort the GDP values in descending order, (ii) find the cumulative sums, (iii) normalize the obtained vector by the sum of all values, and (iv) plot the result. These steps are done with the following two commands:
t = Reverse@Sort@gdps[[All, 2]];
ListPlot[Accumulate[t]/Total[t], PlotRange -> All, GridLines -> {{0.2} Length[t], {0.8}}, Frame -> True]
In this document we are going to use the special function ParetoLawPlot
defined in the next section and the package [1]. Most of the examples use data that is internally accessible within Mathematica. Several external data examples are considered.
See the package [1] for the function RecordsSummary
. See the source file [2] for R functions that facilitate the plotting of Pareto principle graphs. See the package [3] for the outlier detection functions used below.
This simple function makes a list plot that would help assessing the manifestation of the Pareto principle. It takes the same options as ListPlot
.
Clear[ParetoLawPlot]
Options[ParetoLawPlot] = Options[ListPlot];
ParetoLawPlot[dataVec : {_?NumberQ ..}, opts : OptionsPattern[]] := ParetoLawPlot[{Tooltip[dataVec, 1]}, opts];
ParetoLawPlot[dataVecs : {{_?NumberQ ..} ..}, opts : OptionsPattern[]] := ParetoLawPlot[MapThread[Tooltip, {dataVecs, Range[Length[dataVecs]]}], opts];
ParetoLawPlot[dataVecs : {Tooltip[{_?NumberQ ..}, _] ..}, opts : OptionsPattern[]] :=
Block[{t, mc = 0.5},
t = Map[Tooltip[(Accumulate[#]/Total[#] &)[SortBy[#[[1]], -# &]], #[[2]]] &, dataVecs];
ListPlot[t, opts, PlotRange -> All, GridLines -> {Length[t[[1, 1]]] Range[0.1, mc, 0.1], {0.8}}, Frame -> True, FrameTicks -> {{Automatic, Automatic}, {Automatic, Table[{Length[t[[1, 1]]] c, ToString[Round[100 c]] <> "%"}, {c, Range[0.1, mc, 0.1]}]}}]
];
This function is useful for coloring the outliers in the list plots.
ClearAll[ColorPlotOutliers]
ColorPlotOutliers[] := # /. {Point[ps_] :> {Point[ps], Red, Point[ps[[OutlierPosition[ps[[All, 2]]]]]]}} &;
ColorPlotOutliers[oid_] := # /. {Point[ps_] :> {Point[ps], Red, Point[ps[[OutlierPosition[ps[[All, 2]], oid]]]]}} &;
These definitions can be also obtained by loading the packages MathematicaForPredictionUtilities.m and OutlierIdentifiers.m; see [1,3].
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/OutlierIdentifiers.m"]
Below we are going to use the metric system of units. (If preferred we can easily switch to the imperial system.)
$UnitSystem = "Metric";(*"Imperial"*)
We are going to consider a typical Pareto principle example — weatlh of income distribution.
This code find the Gross Domestic Product (GDP) of different countries:
gdps = {CountryData[#, "Name"], CountryData[#, "GDP"]} & /@CountryData["Countries"];
gdps = DeleteCases[gdps, {_, _Missing}] /. Quantity[x_, _] :> x;
The corresponding Pareto plot (note the default grid lines) shows that 10% of countries have 90% of the wealth:
ParetoLawPlot[gdps[[All, 2]], ImageSize -> 400]
Here is the log histogram of the GDP values.
Histogram[Log10@gdps[[All, 2]], 20, PlotRange -> All]
The following code shows the log plot of countries GDP values and the found outliers.
Manipulate[
DynamicModule[{data = Transpose[{Range[Length[gdps]], Sort[gdps[[All, 2]]]}], pos},
pos = OutlierPosition[modFunc@data[[All, 2]], tb@*opar];
If[Length[pos] > 0,
ListLogPlot[{data, data[[pos]]}, PlotRange -> All, PlotTheme -> "Detailed", FrameLabel -> {"Index", "GDP"}, PlotLegends -> SwatchLegend[{"All", "Outliers"}]],
ListLogPlot[{data}, PlotRange -> All, PlotTheme -> "Detailed", FrameLabel -> {"Index", "GDP"}, PlotLegends -> SwatchLegend[{"All", "Outliers"}]]
]
],
{{opar, SPLUSQuartileIdentifierParameters, "outliers detector"}, {HampelIdentifierParameters, SPLUSQuartileIdentifierParameters}},
{{tb, TopOutliers, "bottom|top"}, {BottomOutliers, TopOutliers}},
{{modFunc, Identity, "data modifier function"}, {Identity, Log}}
]
This table gives the values for countries with highest GDP.
Block[{data = gdps[[OutlierPosition[gdps[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]
]
Similar data retrieval and plots can be made for countries populations.
pops = {CountryData[#, "Name"], CountryData[#, "Population"]} & /@CountryData["Countries"];
unit = QuantityUnit[pops[[All, 2]]][[1]];
pops = DeleteCases[pops, {_, _Missing}] /. Quantity[x_, _] :> x;
In the following Pareto plot we can see that 15% of countries have 80% of the total population:
ParetoLawPlot[pops[[All, 2]], PlotLabel -> Row[{"Population", ", ", unit}]]
Here are the countries with most people:
Block[{data = pops[[OutlierPosition[pops[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]
]
We can also see that the Pareto principle holds for the countries areas:
areas = {CountryData[#, "Name"], CountryData[#, "Area"]} & /@CountryData["Countries"];
areas = DeleteCases[areas, {_, _Missing}] /. Quantity[x_, _] :> x;
ParetoLawPlot[areas[[All, 2]]]
Block[{data = areas[[OutlierPosition[areas[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]
]
An interesting diagram is to plot together the curves of GDP changes for different countries. We can see China and Poland have had rapid growth.
res = Table[
(t = CountryData[countryName, {{"GDP"}, {1970, 2015}}];
t = Reverse@Sort[t["Path"][[All, 2]] /. Quantity[x_, _] :> x];
Tooltip[t, countryName])
, {countryName, {"USA", "China", "Poland", "Germany", "France", "Denmark"}}];
ParetoLawPlot[res, PlotRange -> All, Joined -> True, PlotLegends -> res[[All, 2]]]
This dynamic interface can be used for a given country to compare (i) the GDP evolution in time and (ii) the corresponding Pareto plot.
Manipulate[
DynamicModule[{ts, t},
ts = CountryData[countryName, {{"GDP"}, {1970, 2015}}];
t = Reverse@Sort[ts["Path"][[All, 2]] /. Quantity[x_, _] :> x];
Grid[{{"Date list plot of GDP values", "GDP Pareto plot"}, {DateListPlot[ts, ImageSize -> Medium],
ParetoLawPlot[t, ImageSize -> Medium]}}]
], {countryName, {"USA", "China", "Poland", "Germany", "France", "Denmark"}}]
The following code demonstrates that the colors of the pixels in country flags also adhere to the Pareto principle.
flags = CountryData[#, "Flag"] & /@ CountryData["Countries"];
flags[[1 ;; 12]]
ids = ImageData /@ flags;
pixels = Apply[Join, Flatten[ids, 1]];
Clear[ToBinFunc]
ToBinFunc[x_] := Evaluate[Piecewise[MapIndexed[{#2[[1]], #1[[1]] < x <= #1[[2]]} &, Partition[Range[0, 1, 0.1], 2, 1]]]];
pixelsInt = Transpose@Table[Map[ToBinFunc, pixels[[All, i]]], {i, 1, 3}];
pixelsIntTally = SortBy[Tally[pixelsInt], -#[[-1]] &];
ParetoLawPlot[pixelsIntTally[[All, 2]]]
Loking at lengths in the tunnel data we can see the manifestation of an exaggerated Pareto principle.
tunnelLengths = TunnelData[All, {"Name", "Length"}];
tunnelLengths // Length
(* 1552 *)
t = Reverse[Sort[DeleteMissing[tunnelLengths[[All, -1]]] /. Quantity[x_, _] :> x]];
ParetoLawPlot[t]
Here is the logarithmic histogram of the lengths:
Histogram[Log10@t, PlotRange -> All, PlotTheme -> "Detailed"]
The following code gathers the data and make the Pareto plots surface areas, volumes, and fish catch values for lakes. We can that the lakes volumes show exaggerated Pareto principle.
lakeAreas = LakeData[All, "SurfaceArea"];
lakeVolumes = LakeData[All, "Volume"];
lakeFishCatch = LakeData[All, "CommercialFishCatch"];
data = {lakeAreas, lakeVolumes, lakeFishCatch};
t = N@Map[DeleteMissing, data] /. Quantity[x_, _] :> x;
opts = {PlotRange -> All, ImageSize -> Medium}; MapThread[ParetoLawPlot[#1, PlotLabel -> Row[{#2, ", ", #3}], opts] &, {t, {"Lake area", "Lake volume", "Commercial fish catch"}, DeleteMissing[#][[1, 2]] & /@ data}]
One of the examples given in [5] is that the city areas obey the Power Law. Since the Pareto principle is a kind of Power Law we can confirm that observation using Pareto principle plots.
The following grid of Pareto principle plots is for areas and population sizes of cities in selected states of USA.
res = Table[
(cities = CityData[{All, stateName, "USA"}];
t = Transpose@Outer[CityData, cities, {"Area", "Population"}];
t = Map[DeleteMissing[#] /. Quantity[x_, _] :> x &, t, {1}];
ParetoLawPlot[MapThread[Tooltip, {t, {"Area", "Population"}}], PlotLabel -> stateName, ImageSize -> 250])
, {stateName, {"Alabama", "California", "Florida", "Georgia", "Illinois", "Iowa", "Kentucky", "Ohio", "Tennessee"}}];
Legended[Grid[ArrayReshape[res, {3, 3}]], SwatchLegend[Cases[res[[1]], _RGBColor, Infinity], {"Area", "Population"}]]
Looking into the MovieLens 20M dataset, [6], we can see that the Pareto princple holds for (1) most rated movies and (2) most active users. We can also see the manifestation of an exaggerated Pareto law — 90% of all ratings are for 10% of the movies.
The following plot taken from the blog post "PIN analysis", [7], shows that the four digit passwords people use adhere to the Pareto principle: the first 20% of (the unique) most frequently used passwords correspond to the 70% of all passwords use.
ColorNegate[Import["http://www.datagenetics.com/blog/september32012/c.png"]]
[1] Anton Antonov, "MathematicaForPrediction utilities", (2014), source code MathematicaForPrediction at GitHub, https://github.com/antononcube/MathematicaForPrediction, package MathematicaForPredictionUtilities.m.
[2] Anton Antonov, Pareto principle functions in R, source code MathematicaForPrediction at GitHub, https://github.com/antononcube/MathematicaForPrediction, source code file ParetoLawFunctions.R .
[3] Anton Antonov, Implementation of one dimensional outlier identifying algorithms in Mathematica, (2013), MathematicaForPrediction at GitHub, URL: https://github.com/antononcube/MathematicaForPrediction/blob/master/OutlierIdentifiers.m .
[4] Wikipedia entry, "Pareto principle", URL: https://en.wikipedia.org/wiki/Pareto_principle .
[5] Wikipedia entry, "Power law", URL: https://en.wikipedia.org/wiki/Power_law .
[6] GroupLens Research, MovieLens 20M Dataset, (2015).
[7] "PIN analysis", (2012), DataGenetics.
This MathematicaVsR at GitHub project is for comparing Mathematica and R for the tasks of classifier creation, execution, and evaluation using the MNIST database of images of handwritten digits.
Here are the bases built with two different classifiers:
Here are the confusion matrices of the two classifiers:
The blog post "Classification of handwritten digits" (published 2013) has a related more elaborated discussion over a much smaller database of handwritten digits.
The concrete steps taken in scripts and documents of this project follow.
For each digit find the corresponding representation matrix and factorize it.
Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)
For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.
Evaluate the classifier(s) over all test images and compute accuracy, F-Scores, and other measures.
There are scripts going through the steps listed above:
R : "./R/HandwrittenDigitsClassificationByMatrixFactorization.Rmd".
The following documents give expositions that are suitable for reading and following of steps and corresponding results.
R : "./R/HandwrittenDigitsClassificationByMatrixFactorization.pdf", "./R/HandwrittenDigitsClassificationByMatrixFactorization.html".
I figured out first in R how to ingest the data in the binary files of the MNIST database. There were at least several online resources (blog posts, GitHub repositories) that discuss the MNIST binary files ingestion.
After that making the corresponding code in Mathematica was easy.
Same in Mathematica and R for for SVD and NNMF. (As expected.)
NNMF classifiers use the MathematicaForPrediction at GitHub implementations: NonNegativeMatrixFactorization.m and NonNegativeMatrixFactorization.R.
Both Mathematica and R have relatively simple set-up of parallel computations.
It was not very straightforward to come up in R with visualizations for MNIST images. The Mathematica visualization is much more flexible when it comes to plot labeling.
Using Mathematica’s built-in classifiers it was easy to compare the SVD and NNMF classifiers with neural network ones and others. (The SVD and NNMF are much faster to built and they bring comparable precision.)
It would be nice to repeat that in R using one or several of the neural network classifiers provided by Google, Microsoft, H2O, Baidu, etc.
Another possible extension is to use classifier ensembles and Receiver Operation Characteristic (ROC) to create better classifiers. (Both in Mathematica and R.)
Using classifier agnostic importance of variables procedure we can figure out :
which NNMF basis vectors (images) are most important for the classification precision,
which image rows or columns are most important for each digit, or similarly
which image squares of a, say, 4×4 image grid are most important.
This post is to announce the repository MathematicaVsR at GitHub that has example projects, code, and documents for comparing Mathematica with R.
My plan is to proclaim new completed Mathematica-vs-R projects here, in this blog post, and when appropriate make separate blog posts about them.
The development in the MathematicaVsR at GitHub repository aims to provide a collection of relatively simple but non-trivial example projects that illustrate the use of Mathematica and R in different statistical, machine learning, scientific, and software engineering programming activities.
Each of the projects has implementations and documents made with both Mathematica and R — hopefully that would allow comparison and knowledge transfer.
This presentation, "Mathematica vs. R" given at the Wolfram Technology Conference 2015 is probably a good start.
As a warm-up of how to do the comparison see this mind-map (which is made for Mathematica users):
The future projects are listed in order of their completion time proximity — the highest in the list would be committed the soonest.
Personal banking data obfuscation
Independent Component Analysis (ICA) programming and basic applications
High Performance Computing (HPC) projects — Spark, H2O, etc.
Informal verification of time series co-dependency
Recommendation engines
The main goals of this document are:
i) to demonstrate how to create versions and combinations of classifiers utilizing different perspectives,
ii) to apply the Receiver Operating Characteristic (ROC) technique into evaluating the created classifiers (see [2,3]) and
iii) to illustrate the use of the Mathematica packages [5,6].
The concrete steps taken are the following:
Create an ensemble of classifiers and compare its performance to the individual classifiers in the ensemble.
Produce classifier versions with from changed data in order to explore the effect of records outliers.
Make a bootstrapping classifier ensemble and evaluate and compare its performance.
Systematically diminish the training data and evaluate the results with ROC.
Show how to do classifier interpolation utilizing ROC.
In the steps above we skip the necessary preliminary data analysis. For the datasets we use in this document that analysis has been done elsewhere. (See [,,,].) Nevertheless, since ROC is mostly used for binary classifiers we want to analyze the class labels distributions in the datasets in order to designate which class labels are "positive" and which are "negative."
Assume we are given a binary classifier with the class labels P and N (for "positive" and "negative" respectively).
Consider the following measures True Positive Rate (TPR):
and False Positive Rate (FPR):
Assume that we can change the classifier results with a parameter and produce a plot like this one:
For each parameter value the point is plotted; points corresponding to consecutive ‘s are connected with a line. We call the obtained curve the ROC curve for the classifier in consideration. The ROC curve resides in the ROC space as defined by the functions FPR and TPR corresponding respectively to the -axis and the -axis.
The ideal classifier would have its ROC curve comprised of a line connecting {0,0} to {0,1} and a line connecting {0,1} to {1,1}.
Given a classifier the ROC point closest to {0,1}, generally, would be considered to be the best point.
This document started as being a part of a conference presentation about illustrating the cultural differences between Statistics and Machine learning (for Wolfram Technology Conference 2016). Its exposition become both deeper and wider than expected. Here are the alternative, original goals of the document:
i) to demonstrate how using ROC a researcher can explore classifiers performance without intimate knowledge of the classifiers` mechanisms, and
ii) to provide concrete examples of the typical investigation approaches employed by machine learning researchers.
To make those points clearer and more memorable we are going to assume that exposition is a result of the research actions of a certain protagonist with a suitably selected character.
A by-product of the exposition is that it illustrates the following lessons from machine learning practices. (See [1].)
The outcomes of the good machine learning algorithms might be fairly complex. I.e. there are no simple interpretations when really good results are obtained.
Having high dimensional data can be very useful.
In [1] these three points and discussed under the names "Rashomon", "Occam", and "Bellman". To quote:
Rashomon: the multiplicity of good models;
Occam: the conflict between simplicity and accuracy;
Bellman: dimensionality — curse or blessing."
Our protagonist is a "Simple Nuclear Physicist" (SNP) — someone who is accustomed to obtaining a lot of data that has to be analyzed and mined sometimes very deeply, rigorously, and from a lot of angles, for different hypotheses. SNP is fairly adept in programming and critical thinking, but he does not have or care about deep knowledge of statistics methods or machine learning algorithms. SNP is willing and capable to use software libraries that provide algorithms for statistical methods and machine learning.
SNP is capable of coming up with ROC if he is not aware of it already. ROC is very similar to the so called phase space diagrams physicists do.
These commands load the used Mathematica packages [4,5,6]:
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/ROCFunctions.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/ClassifierEnsembles.m"]
These commands load the Titanic data (that is shipped with Mathematica).
data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];
columnNames = (Flatten@*List) @@ ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
trainingData = DeleteCases[data, {___, _Missing, ___}];
Dimensions[trainingData]
(* {732, 4} *)
RecordsSummary[trainingData, columnNames]
data = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
testData = DeleteCases[data, {___, _Missing, ___}];
Dimensions[testData]
(* {314, 4} *)
RecordsSummary[testData, columnNames]
nTrainingData = trainingData /. {"survived" -> 1, "died" -> 0, "1st" -> 0, "2nd" -> 1, "3rd" -> 2, "male" -> 0, "female" -> 1};
This command makes a classifier ensemble of two built-in classifiers "NearestNeighbors" and "NeuralNetwork":
aCLs = EnsembleClassifier[{"NearestNeighbors", "NeuralNetwork"}, trainingData[[All, 1 ;; -2]] -> trainingData[[All, -1]]]
A classifier ensemble of the package [6] is simply an association mapping classifier IDs to classifier functions.
The first argument given to EnsembleClassifier
can be Automatic
:
SeedRandom[8989]
aCLs = EnsembleClassifier[Automatic, trainingData[[All, 1 ;; -2]] -> trainingData[[All, -1]]];
With Automatic
the following built-in classifiers are used:
Keys[aCLs]
(* {"NearestNeighbors", "NeuralNetwork", "LogisticRegression", "RandomForest", "SupportVectorMachine", "NaiveBayes"} *)
Classification with the classifier ensemble can be done using the function EnsembleClassify
. If the third argument of EnsembleClassify
is "Votes" the result is the class label that appears the most in the ensemble results.
EnsembleClassify[aCLs, testData[[20, 1 ;; -2]], "Votes"]
(* "died" *)
The following commands clarify the voting done in the command above.
Map[#[testData[[20, 1 ;; -2]]] &, aCLs]
Tally[Values[%]]
(* <|"NearestNeighbors" -> "died", "NeuralNetwork" -> "survived", "LogisticRegression" -> "survived", "RandomForest" -> "died", "SupportVectorMachine" -> "died", "NaiveBayes" -> "died"|> *)
(* {{"died", 4}, {"survived", 2}} *)
If the third argument of EnsembleClassify
is "ProbabilitiesMean
" the result is the class label that has the highest mean probability in the ensemble results.
EnsembleClassify[aCLs, testData[[20, 1 ;; -2]], "ProbabilitiesMean"]
(* "died" *)
The following commands clarify the probability averaging utilized in the command above.
Map[#[testData[[20, 1 ;; -2]], "Probabilities"] &, aCLs]
Mean[Values[%]]
(* <|"NearestNeighbors" -> <|"died" -> 0.598464, "survived" -> 0.401536|>, "NeuralNetwork" -> <|"died" -> 0.469274, "survived" -> 0.530726|>, "LogisticRegression" -> <|"died" -> 0.445915, "survived" -> 0.554085|>,
"RandomForest" -> <|"died" -> 0.652414, "survived" -> 0.347586|>, "SupportVectorMachine" -> <|"died" -> 0.929831, "survived" -> 0.0701691|>, "NaiveBayes" -> <|"died" -> 0.622061, "survived" -> 0.377939|>|> *)
(* <|"died" -> 0.61966, "survived" -> 0.38034|> *)
The third argument of EnsembleClassifyByThreshold
takes a rule of the form label->threshold; the fourth argument is eighter "Votes" or "ProbabiltiesMean".
The following code computes the ROC curve for a range of votes.
rocRange = Range[0, Length[aCLs] - 1, 1];
aROCs = Table[(
cres = EnsembleClassifyByThreshold[aCLs, testData[[All, 1 ;; -2]], "survived" -> i, "Votes"]; ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {i, rocRange}];
ROCPlot[rocRange, aROCs, "PlotJoined" -> Automatic, GridLines -> Automatic]
If we want to compute ROC of a range of probability thresholds we EnsembleClassifyByThreshold
with the fourth argument being "ProbabilitiesMean".
EnsembleClassifyByThreshold[aCLs, testData[[1 ;; 6, 1 ;; -2]], "survived" -> 0.2, "ProbabilitiesMean"]
(* {"survived", "survived", "survived", "survived", "survived", "survived"} *)
EnsembleClassifyByThreshold[aCLs, testData[[1 ;; 6, 1 ;; -2]], "survived" -> 0.6, "ProbabilitiesMean"]
(* {"survived", "died", "survived", "died", "died", "survived"} *)
The implementation of EnsembleClassifyByThreshold
with "ProbabilitiesMean" relies on the ClassifierFunction
signature:
ClassifierFunction[__][record_, "Probabilities"]
Here is the corresponding ROC plot:
rocRange = Range[0, 1, 0.025];
aROCs = Table[(
cres = EnsembleClassifyByThreshold[aCLs, testData[[All, 1 ;; -2]], "survived" -> i, "ProbabilitiesMean"]; ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {i, rocRange}];
rocEnGr = ROCPlot[rocRange, aROCs, "PlotJoined" -> Automatic, PlotLabel -> "Classifier ensemble", GridLines -> Automatic]
This plot compares the ROC curve of the ensemble classifier with the ROC curves of the classifiers that comprise the ensemble.
rocGRs = Table[
aROCs1 = Table[(
cres = ClassifyByThreshold[aCLs[[i]], testData[[All, 1 ;; -2]], "survived" -> th];
ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {th, rocRange}];
ROCPlot[rocRange, aROCs1, PlotLabel -> Keys[aCLs][[i]], PlotRange -> {{0, 1.05}, {0.6, 1.01}}, "PlotJoined" -> Automatic, GridLines -> Automatic],
{i, 1, Length[aCLs]}];
GraphicsGrid[ArrayReshape[Append[Prepend[rocGRs, rocEnGr], rocEnGr], {2, 4}, ""], Dividers -> All, FrameStyle -> GrayLevel[0.8], ImageSize -> 1200]
Let us plot all ROC curves from the graphics grid above into one plot. For that the single classifier ROC curves are made gray, and their threshold callouts removed. We can see that the classifier ensemble brings very good results for and none of the single classifiers has a better point.
Show[Append[rocGRs /. {RGBColor[___] -> GrayLevel[0.8]} /. {Text[p_, ___] :> Null} /. ((PlotLabel -> _) :> (PlotLabel -> Null)), rocEnGr]]
There are several ways to produce ensemble classifiers using bootstrapping or jackknife resampling procedures.
First, we are going to make a bootstrapping classifier ensemble using one of the Classify
methods. Then we are going to make a more complicated bootstrapping classifier with six methods of Classify
.
First we select a classification method and make a classifier with it.
clMethod = "NearestNeighbors";
sCL = Classify[trainingData[[All, 1 ;; -2]] -> trainingData[[All, -1]], Method -> clMethod];
The following code makes a classifier ensemble of 12 classifier functions using resampled, slightly smaller (10%) versions of the original training data (with RandomChoice
).
SeedRandom[1262];
aBootStrapCLs = Association@Table[(
inds = RandomChoice[Range[Length[trainingData]], Floor[0.9*Length[trainingData]]];
ToString[i] -> Classify[trainingData[[inds, 1 ;; -2]] -> trainingData[[inds, -1]], Method -> clMethod]), {i, 12}];
Let us compare the ROC curves of the single classifier with the bootstrapping derived ensemble.
rocRange = Range[0.1, 0.9, 0.025];
AbsoluteTiming[
aSingleROCs = Table[(
cres = ClassifyByThreshold[sCL, testData[[All, 1 ;; -2]], "survived" -> i]; ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {i, rocRange}];
aBootStrapROCs = Table[(
cres = EnsembleClassifyByThreshold[aBootStrapCLs, testData[[All, 1 ;; -2]], "survived" -> i]; ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {i, rocRange}];
]
(* {6.81521, Null} *)
Legended[
Show[{
ROCPlot[rocRange, aSingleROCs, "ROCColor" -> Blue, "PlotJoined" -> Automatic, GridLines -> Automatic],
ROCPlot[rocRange, aBootStrapROCs, "ROCColor" -> Red, "PlotJoined" -> Automatic]}],
SwatchLegend @@ Transpose@{{Blue, Row[{"Single ", clMethod, " classifier"}]}, {Red, Row[{"Boostrapping ensemble of\n", Length[aBootStrapCLs], " ", clMethod, " classifiers"}]}}]
We can see that we get much better results with the bootstrapped ensemble.
This code creates an classifier ensemble using the classifier methods corresponding to Automatic
given as a first argument to EnsembleClassifier
.
SeedRandom[2324]
AbsoluteTiming[
aBootStrapLargeCLs = Association@Table[(
inds = RandomChoice[Range[Length[trainingData]], Floor[0.9*Length[trainingData]]];
ecls = EnsembleClassifier[Automatic, trainingData[[inds, 1 ;; -2]] -> trainingData[[inds, -1]]];
AssociationThread[Map[# <> "-" <> ToString[i] &, Keys[ecls]] -> Values[ecls]]
), {i, 12}];
]
(* {27.7975, Null} *)
This code computes the ROC statistics with the obtained bootstrapping classifier ensemble:
AbsoluteTiming[
aBootStrapLargeROCs = Table[(
cres = EnsembleClassifyByThreshold[aBootStrapLargeCLs, testData[[All, 1 ;; -2]], "survived" -> i]; ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres]), {i, rocRange}];
]
(* {45.1995, Null} *)
Let us plot the ROC curve of the bootstrapping classifier ensemble (in blue) and the single classifier ROC curves (in gray):
aBootStrapLargeGr = ROCPlot[rocRange, aBootStrapLargeROCs, "PlotJoined" -> Automatic];
Show[Append[rocGRs /. {RGBColor[___] -> GrayLevel[0.8]} /. {Text[p_, ___] :> Null} /. ((PlotLabel -> _) :> (PlotLabel -> Null)), aBootStrapLargeGr]]
Again we can see that the bootstrapping ensemble produced better ROC points than the single classifiers.
This section tries to explain why the bootstrapping with resampling to smaller sizes produces good results.
In short, the training data has outliers; if we remove small fractions of the training data we might get better results.
The procedure described in this section can be used in conjunction with the procedures described in the guide for importance of variables investigation [7].
Let us replace the categorical values with numerical in the training data. There are several ways to do it, here is a fairly straightforward one:
nTrainingData = trainingData /. {"survived" -> 1, "died" -> 0, "1st" -> 0, "2nd" -> 1, "3rd" -> 2, "male" -> 0, "female" -> 1};
First, let us find all indices corresponding to records about females.
femaleInds = Flatten@Position[trainingData[[All, 3]], "female"];
The following code standardizes the training data corresponding to females, finds the mean record, computes distances from the mean record, and finally orders the female records indices according to their distances from the mean record.
t = Transpose@Map[Rescale@*Standardize, N@Transpose@nTrainingData[[femaleInds, 1 ;; 2]]];
m = Mean[t];
ds = Map[EuclideanDistance[#, m] &, t];
femaleInds = femaleInds[[Reverse@Ordering[ds]]];
The following plot shows the distances calculated above.
ListPlot[Sort@ds, PlotRange -> All, PlotTheme -> "Detailed"]
The following code removes from the training data the records corresponding to females according to the order computed above. The female records farthest from the mean female record are removed first.
AbsoluteTiming[
femaleFrRes = Association@
Table[cl ->
Table[(
inds = Complement[Range[Length[trainingData]], Take[femaleInds, Ceiling[fr*Length[femaleInds]]]];
cf = Classify[trainingData[[inds, 1 ;; -2]] -> trainingData[[inds, -1]], Method -> cl]; cfPredictedLabels = cf /@ testData[[All, 1 ;; -2]];
{fr, ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cfPredictedLabels]}),
{fr, 0, 0.8, 0.05}],
{cl, {"NearestNeighbors", "NeuralNetwork", "LogisticRegression", "RandomForest", "SupportVectorMachine", "NaiveBayes"}}];
]
(* {203.001, Null} *)
The following graphics grid shows how the classification results are affected by the removing fractions of the female records from the training data. The results for none or small fractions of records removed are more blue.
GraphicsGrid[ArrayReshape[
Table[
femaleAROCs = femaleFrRes[cl][[All, 2]];
frRange = femaleFrRes[cl][[All, 1]]; ROCPlot[frRange, femaleAROCs, PlotRange -> {{0.0, 0.25}, {0.2, 0.8}}, PlotLabel -> cl, "ROCPointColorFunction" -> (Blend[{Blue, Red}, #3/Length[frRange]] &), ImageSize -> 300],
{cl, Keys[femaleFrRes]}],
{2, 3}], Dividers -> All]
We can see that removing the female records outliers has dramatic effect on the results by the classifiers "NearestNeighbors" and "NeuralNetwork". Not so much on "LogisticRegression" and "NaiveBayes".
The code in this sub-section repeats the experiment described in the previous one males (instead of females).
maleInds = Flatten@Position[trainingData[[All, 3]], "male"];
t = Transpose@Map[Rescale@*Standardize, N@Transpose@nTrainingData[[maleInds, 1 ;; 2]]];
m = Mean[t];
ds = Map[EuclideanDistance[#, m] &, t];
maleInds = maleInds[[Reverse@Ordering[ds]]];
ListPlot[Sort@ds, PlotRange -> All, PlotTheme -> "Detailed"]
AbsoluteTiming[
maleFrRes = Association@
Table[cl ->
Table[(
inds = Complement[Range[Length[trainingData]], Take[maleInds, Ceiling[fr*Length[maleInds]]]];
cf = Classify[trainingData[[inds, 1 ;; -2]] -> trainingData[[inds, -1]], Method -> cl]; cfPredictedLabels = cf /@ testData[[All, 1 ;; -2]];
{fr, ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cfPredictedLabels]}),
{fr, 0, 0.8, 0.05}],
{cl, {"NearestNeighbors", "NeuralNetwork", "LogisticRegression", "RandomForest", "SupportVectorMachine", "NaiveBayes"}}];
]
(* {179.219, Null} *)
GraphicsGrid[ArrayReshape[
Table[
maleAROCs = maleFrRes[cl][[All, 2]];
frRange = maleFrRes[cl][[All, 1]]; ROCPlot[frRange, maleAROCs, PlotRange -> {{0.0, 0.35}, {0.55, 0.82}}, PlotLabel -> cl, "ROCPointColorFunction" -> (Blend[{Blue, Red}, #3/Length[frRange]] &), ImageSize -> 300],
{cl, Keys[maleFrRes]}],
{2, 3}], Dividers -> All]
Assume that we want a classifier that for a given representative set of items (records) assigns the positive label to an exactly of them. (Or very close to that number.)
If we have two classifiers, one returning more positive items than , the other less than , then we can use geometric computations in the ROC space in order to obtain parameters for a classifier interpolation that will bring positive items close to ; see [3]. Below is given Mathematica code with explanations of how that classifier interpolation is done.
Assume that by prior observations we know that for a given dataset of items the positive class consists of items. Assume that for a given unknown dataset of items we want of the items to be classified as positive. We can write the equation:
which can be simplified to
Consider the following two classifiers.
cf1 = Classify[trainingData[[All, 1 ;; -2]] -> trainingData[[All, -1]], Method -> "RandomForest"];
cfROC1 = ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cf1[testData[[All, 1 ;; -2]]]]
(* <|"TruePositive" -> 82, "FalsePositive" -> 22, "TrueNegative" -> 170, "FalseNegative" -> 40|> *)
cf2 = Classify[trainingData[[All, 1 ;; -2]] -> trainingData[[All, -1]], Method -> "LogisticRegression"];
cfROC2 = ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cf2[testData[[All, 1 ;; -2]]]]
(* <|"TruePositive" -> 89, "FalsePositive" -> 37, "TrueNegative" -> 155, "FalseNegative" -> 33|> *)
Here are the ROC space points corresponding to the two classifiers, cf1 and cf2:
p1 = Through[ROCFunctions[{"FPR", "TPR"}][cfROC1]];
p2 = Through[ROCFunctions[{"FPR", "TPR"}][cfROC2]];
Here is the breakdown of frequencies of the class labels:
Tally[trainingData[[All, -1]]]
%[[All, 2]]/Length[trainingData] // N
(* {{"survived", 305}, {"died", 427}}
{0.416667, 0.583333}) *)
We want to our classifier to produce % people to survive. Here we find two points of the corresponding constraint line (on which we ROC points of the desired classifiers should reside):
sol1 = Solve[{{x, y} \[Element] ImplicitRegion[{x (1 - 0.42) + y 0.42 == 0.38}, {x, y}], x == 0.1}, {x, y}][[1]]
sol2 = Solve[{{x, y} \[Element] ImplicitRegion[{x (1 - 0.42) + y 0.42 == 0.38}, {x, y}], x == 0.25}, {x, y}][[1]]
(* {x -> 0.1, y -> 0.766667}
{x -> 0.25, y -> 0.559524} *)
Here using the points q1 and q2 of the constraint line we find the intersection point with the line connecting the ROC points of the classifiers:
{q1, q2} = {{x, y} /. sol1, {x, y} /. sol2};
sol = Solve[ {{x, y} \[Element] InfiniteLine[{q1, q2}] \[And] {x, y} \[Element] InfiniteLine[{p1, p2}]}, {x, y}];
q = {x, y} /. sol[[1]]
(* {0.149753, 0.69796} *)
Let us plot all geometric objects:
Graphics[{PointSize[0.015], Blue, Tooltip[Point[p1], "cf1"], Black,
Text["cf1", p1, {-1.5, 1}], Red, Tooltip[Point[p2], "cf2"], Black,
Text["cf2", p2, {1.5, -1}], Black, Point[q], Dashed,
InfiniteLine[{q1, q2}], Thin, InfiniteLine[{p1, p2}]},
PlotRange -> {{0., 0.3}, {0.6, 0.8}},
GridLines -> Automatic, Frame -> True]
Next we find the ratio of the distance from the intersection point q to the cf1 ROC point and the distance between the ROC points of cf1 and cf2.
k = Norm[p1 - q]/Norm[p1 - p2]
(* 0.450169 *)
The classifier interpolation is made by a weighted random selection based on that ratio (using RandomChoice
):
SeedRandom[8989]
cres = MapThread[If, {RandomChoice[{1 - k, k} -> {True, False}, Length[testData]], cf1@testData[[All, 1 ;; -2]], cf2@testData[[All, 1 ;; -2]]}];
cfROC3 = ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres];
p3 = Through[ROCFunctions[{"FPR", "TPR"}][cfROC3]];
Graphics[{PointSize[0.015], Blue, Point[p1], Red, Point[p2], Black, Dashed, InfiniteLine[{q1, q2}], Green, Point[p3]},
PlotRange -> {{0., 0.3}, {0.6, 0.8}},
GridLines -> Automatic, Frame -> True]
We can run the process multiple times in order to convince ourselves that the interpolated classifier ROC point is very close to the constraint line most of the time.
p3s =
Table[(
cres =
MapThread[If, {RandomChoice[{1 - k, k} -> {True, False}, Length[testData]], cf1@testData[[All, 1 ;; -2]], cf2@testData[[All, 1 ;; -2]]}];
cfROC3 = ToROCAssociation[{"survived", "died"}, testData[[All, -1]], cres];
Through[ROCFunctions[{"FPR", "TPR"}][cfROC3]]), {1000}];
Show[{SmoothDensityHistogram[p3s, ColorFunction -> (Blend[{White, Green}, #] &), Mesh -> 3],
Graphics[{PointSize[0.015], Blue, Tooltip[Point[p1], "cf1"], Black, Text["cf1", p1, {-1.5, 1}],
Red, Tooltip[Point[p2], "cf2"], Black, Text["cf2", p2, {1.5, -1}],
Black, Dashed, InfiniteLine[{q1, q2}]}, GridLines -> Automatic]},
PlotRange -> {{0., 0.3}, {0.6, 0.8}},
GridLines -> Automatic, Axes -> True,
AspectRatio -> Automatic]
[1] Leo Breiman, Statistical Modeling: The Two Cultures, (2001), Statistical Science, Vol. 16, No. 3, 199[Dash]231.
[2] Wikipedia entry, Receiver operating characteristic. URL: http://en.wikipedia.org/wiki/Receiver_operating_characteristic .
[3] Tom Fawcett, An introduction to ROC analysis, (2006), Pattern Recognition Letters, 27, 861[Dash]874. (Link to PDF.)
[4] Anton Antonov, MathematicaForPrediction utilities, (2014), source code MathematicaForPrediction at GitHub, package MathematicaForPredictionUtilities.m.
[5] Anton Antonov, Receiver operating characteristic functions Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package ROCFunctions.m.
[6] Anton Antonov, Classifier ensembles functions Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package ClassifierEnsembles.m.
[7] Anton Antonov, "Importance of variables investigation guide", (2016), MathematicaForPrediction at GitHub, folder Documentation.
This post is for using the package [2] that provides Mathematica implementations of Receiver Operating Characteristic (ROC) functions calculation and plotting. The ROC framework is used for analysis and tuning of binary classifiers, [3]. (The classifiers are assumed to classify into a positive/true label or a negative/false label. )
The function ROCFuntions
gives access to the individual ROC functions through string arguments. Those ROC functions are applied to special objects, called ROC Association objects.
Each ROC Association object is an Association
that has the following four keys: "TruePositive", "FalsePositive", "TrueNegative", and "FalseNegative" .
Given two lists of actual and predicted labels a ROC Association object can be made with the function ToROCAssociation
.
For more definitions and example of ROC terminology and functions see [3].
I was asked in this discussion why Linear regression and not, say, Logistic regression.
Here is my answer:
1. I am trying to do a minimal and quick to execute example — the code of the post is included in the package ROCFunctions.m.
2. I am aware that there are better alternatives of LinearModelFit
, but I plan to discuss those in the MathematicaVsR project: “Regression with ROC”. (As the name hints, it is not just about Linear regression.)
3. Another point is that although the Linear regression is not a good method for this classification, nevertheless using ROC it can be made to give better results than, say, the built-in “NeuralNetwork” method. See the last section of “Linear regression with ROC.md”.
Note that here although we use both of the provided Titanic training and test data, the code is doing only training. The test data is used to find the best tuning parameter (threshold) through ROC analysis.
These commands load the packages [1,2]:
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/ROCFunctions.m"]
Here is the summary of the Titanic data used below:
titanicData = (Flatten@*List) @@@ExampleData[{"MachineLearning", "Titanic"}, "Data"];
columnNames = (Flatten@*List) @@ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];
RecordsSummary[titanicData, columnNames]
This variable dependence grid shows the relationships between the variables.
Magnify[#, 0.7] &@VariableDependenceGrid[titanicData, columnNames]
data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
trainingData = DeleteCases[data, {___, _Missing, ___}];
Dimensions[trainingData]
(* {732, 4} *)
data = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
testData = DeleteCases[data, {___, _Missing, ___}];
Dimensions[testData]
(* {314, 4} *)
trainingData = trainingData /. {"survived" -> 1, "died" -> 0, "1st" -> 0, "2nd" -> 1, "3rd" -> 2, "male" -> 0, "female" -> 1};
testData = testData /. {"survived" -> 1, "died" -> 0, "1st" -> 1, "2nd" -> 2, "3rd" -> 3, "male" -> 0, "female" -> 1};
lfm = LinearModelFit[{trainingData[[All, 1 ;; -2]], trainingData[[All, -1]]}]
modelValues = lfm @@@ testData[[All, 1 ;; -2]];
Histogram[modelValues, 20]
RecordsSummary[modelValues]
testLabels = testData[[All, -1]];
thRange = Range[0.1, 0.9, 0.025];
aROCs = Table[ToROCAssociation[{0, 1}, testLabels, Map[If[# > \[Theta], 1, 0] &, modelValues]], {\[Theta], thRange}];
Through[ROCFunctions[{"PPV", "NPV", "TPR", "ACC", "SPC"}][aROCs[[3]]]]
(* {34/43, 19/37, 17/32, 197/314, 95/122} *)
ROCPlot[thRange, aROCs, "PlotJoined" -> Automatic, "ROCPointCallouts" -> True, "ROCPointTooltips" -> True, GridLines -> Automatic]
ListLinePlot[Map[Transpose[{thRange, #}] &, Transpose[Map[Through[ROCFunctions[{"PPV", "NPV", "TPR", "ACC", "SPC"}][#]] &, aROCs]]],
Frame -> True, FrameLabel -> Map[Style[#, Larger] &, {"threshold, \[Theta]", "rate"}], PlotLegends -> Map[# <> ", " <> (ROCFunctions["FunctionInterpretations"][#]) &, {"PPV", "NPV", "TPR", "ACC", "SPC"}], GridLines -> Automatic]
We want to find a point that provides balanced positive and negative labels success rates. One way to do this is to find the intersection point of the ROC functions PPV (positive predictive value) and TPR (true positive rate).
Examining the plot above we can come up with the initial condition for \(x\).
ppvFunc = Interpolation[Transpose@{thRange, ROCFunctions["PPV"] /@ aROCs}];
tprFunc = Interpolation[Transpose@{thRange, ROCFunctions["TPR"] /@ aROCs}];
FindRoot[ppvFunc[x] - tprFunc[x] == 0, {x, 0.2}]
(* {x -> 0.3} *)
The Area Under the ROC curve (AUROC) tells for a given range of the controlling parameter "what is the probability of the classifier to rank a randomly chosen positive instance higher than a randomly chosen negative instance, (assuming ‘positive’ ranks higher than ‘negative’)", [3,4]
Calculating AUROC is easy using the Trapezoidal quadrature formula:
N@Total[Partition[Sort@Transpose[{ROCFunctions["FPR"] /@ aROCs, ROCFunctions["TPR"] /@ aROCs}], 2, 1]
/. {{x1_, y1_}, {x2_, y2_}} :> (x2 - x1) (y1 + (y2 - y1)/2)]
(* 0.698685 *)
It is also implemented in [2]:
N@ROCFunctions["AUROC"][aROCs]
(* 0.698685 *)
[1] Anton Antonov, MathematicaForPrediction utilities, (2014), source code MathematicaForPrediction at GitHub, package MathematicaForPredictionUtilities.m.
[2] Anton Antonov, Receiver operating characteristic functions Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package ROCFunctions.m .
[3] Wikipedia entry, Receiver operating characteristic. URL: http://en.wikipedia.org/wiki/Receiver_operating_characteristic .
[4] Tom Fawcett, An introduction to ROC analysis, (2006), Pattern Recognition Letters, 27, 861-874.
In statistics contingency tables are matrices used to show the co-occurrence of variable values of multi-dimensional data. They are fundamental in many types of research. This document shows how to use several functions implemented in Mathematica for the construction of contingency tables.
In this document we are going to use the implementations in the package MathematicaForPredictionUtilities.m from MathematicaForPrediction at GitHub, [1].
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]
The implementation of CrossTabulate in [1] is based on Tally, GatherBy, and SparseArray. The implementation of xtabsViaRLink in [1] is based on R‘s function xtabs called via RLink.
Other package used in this document are [2] and [4] imported with these commands:
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/MosaicPlot.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/RSparseMatrix.m"]
For a different approach to implementing cross-tabulation than those taken in [1] see the Stack Overflow answer http://stackoverflow.com/a/8101951 by Mr.Wizard.
titanicData =
Flatten@*List @@@ ExampleData[{"MachineLearning", "Titanic"}, "Data"];
titanicData = DeleteCases[titanicData, {___, _Missing, ___}];
titanicColumnNames =
Flatten@*List @@ ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];
aTitanicColumnNames =
AssociationThread[titanicColumnNames -> Range[Length[titanicColumnNames]]];
Note that we have removed the records with missing data (for simpler exposition).
Dimensions[titanicData]
(* {1046, 4} *)
RecordsSummary[titanicData, titanicColumnNames]
Assume that we want to group the people according to their passenger class and survival and we want to find the average age for each group.
We can do that by
1. finding the counts contingency table for the variables "passenger class" and "passenger survival",
2. finding the age contingency table for the same variables, and
3. do the element-wise division .
ctCounts =
CrossTabulate[
titanicData[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival"}]]];
MatrixForm[#1, TableHeadings -> {#2, #3}] & @@ ctCounts
ctTotalAge =
CrossTabulate[
titanicData[[All,
aTitanicColumnNames /@ {"passenger class", "passenger survival",
"passenger age"}]]];
MatrixForm[#1, TableHeadings -> {#2, #3}] & @@ ctTotalAge
MatrixForm[
ctTotalAge[[1]]/
Normal[ctCounts[[1]]],
TableHeadings -> Values[Rest[ctTotalAge]]]
(We have to make the sparse array ctCounts
a regular array because otherwise we get warnings for division by zero because of the sparse array’s default value.)
Let us repeat the steps above by dividing the passengers before-hand according to their sex.
Association@
Map[
(mCount =
CrossTabulate[#[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival"}]]];
mAge = CrossTabulate[#[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival", "passenger age"}]]];
#[[1, aTitanicColumnNames["passenger sex"]]] ->
MatrixForm[mAge[[1]]/Normal[mCount[[1]]], TableHeadings -> Values[Rest[mAge]]]) &,
GatherBy[titanicData, #[[aTitanicColumnNames["passenger sex"]]] &]]
The alternative of CrossTabulate is xtabsViaRLink that is uses R’s function xtabs via RLink.
Needs["RLink`"]
RLinkResourcesInstall[]
InstallR[]
(* {Paclet[RLinkRuntime,9.0.0.0, <>]} *)
ctCounts =
FromRXTabsForm@
xtabsViaRLink[
titanicData[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival"}]],
{"passenger.sex", "passenger.survival"},
" ~ passenger.sex + passenger.survival"];
MatrixForm[#1, TableHeadings -> {#2, #3}] & @@ ctCounts
The graphical visualization of a dataset with mosaic plots, [2,3], is similar in spirit to contingency tables. Compare the following mosaic plot with the contingency table in the last section.
MosaicPlot[
titanicData[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival"}]] ]
At this point we might want to be able to call MatrixForm more directly for the output of CrossTabulate and FromRXTabsForm. One way to do this is to define an up-value for Association .
Unprotect[Association];
MatrixForm[x_Association /; (KeyExistsQ[x, "XTABMatrix"] || KeyExistsQ[x, "XTABTensor"])] ^:= (MatrixForm[#1, TableHeadings -> Rest[{##}]] & @@ x);
Protect[Association];
Now we can do this:
MatrixForm @
CrossTabulate[titanicData[[All, aTitanicColumnNames /@ {"passenger class", "passenger survival"}]]]
Remark: Because of this up-value definition for Association with MatrixForm we have the associations returned by CrossTabulate
and FromRXTabsForm
to have the key "XTABMatrix" instead of "Matrix", the former assumed to be much more rarely to be used than the latter.
Let us consider an example with larger data that has larger number of unique values in its columns.
The following dataset is taken from [6].
data = Import[ "/Volumes/WhiteSlimSeagate/Datasets/UCI Online Retail Data Set/Online Retail.csv"];
columnNames = First[data];
data = Rest[data];
aColumnNames = AssociationThread[columnNames -> Range[Length[columnNames]]];
We have rows and columns:
Dimensions[data]
(* {65499, 8} *)
Here is a summary of the columns:
Magnify[#, 0.75] &@RecordsSummary[data, columnNames]
There is no one-to-one correspondence between the values of the column "Description" and the column "StockCode" which can be seen with this command:
MinMax@Map[Length@*Union,
GatherBy[data[[All, aColumnNames /@ {"Description", "StockCode"}]], First]]
(* {1,144} *)
The way in which the column "StockCode" was ingested made it have multiple types for its values:
Tally[NumberQ /@ data[[All, aColumnNames["StockCode"]]]]
(* {{False,9009},{True,56490}} *)
So let us convert it to all strings:
data[[All, aColumnNames["StockCode"]]] =
ToString /@ data[[All, aColumnNames["StockCode"]]];
Here we find the contingency table for "Country" and "StockCode" over "Quantity" using CrossTabulate:
AbsoluteTiming[
ctRes = CrossTabulate[
data[[All, aColumnNames /@ {"Country", "StockCode", "Quantity"}]]];
]
(* {0.256339,Null} *)
Here we find the contingency table for "Country" and "StockCode" over "Quantity" using xtabsViaRLink:
AbsoluteTiming[
rres = xtabsViaRLink[
data[[All, aColumnNames /@ {"Country", "StockCode", "Quantity"}]],
{"Country", "StockCode", "Quantity"},
"Quantity ~ Country + StockCode"];
ctRRes = FromRXTabsForm[rres];
]
(* {0.843621,Null} *)
Both functions produce the same result:
ctRRes["matrix"] == N@ctRRes[[1]]
(* True *)
Note that xtabsViaRLink
is slower but still fairly quick.
Here we plot the contingency table using MatrixPlot :
MatrixPlot[ctRRes["matrix"], AspectRatio -> 1/1.5,
FrameTicks -> {{#, #} &@ Table[{i, ctRRes["rownames"][[i]]}, {i, Length[ctRRes["rownames"]]}],
{Automatic, Automatic}}, ImageSize -> 550]
Let us extend the data with columns that have months and quarters corresponding to the invoice dates.
The following commands computing date objects and extracting month and quarter values from them are too slow.
(*AbsoluteTiming[dobjs=DateObject[{#,{"Month","/","Day","/","Year"," \
","Hour",":","Minute"}}]&/@data[[All,aColumnNames["InvoiceDate"]]];
]*)
(* {30.2595,Null} *)
(*AbsoluteTiming[
dvals=DateValue[dobjs,{"MonthName","QuarterNameShort"}];
]*)
(* {91.1732,Null} *)
We can use the following ad hoc computation instead.
dvals = StringSplit[#, {"/", " ", ":"}] & /@
data[[All,
aColumnNames["InvoiceDate"]]];
This summary shows that the second value in the dates is day of month, and the first value is most likely month.
Magnify[#, 0.75] &@ RecordsSummary[dvals[[All, 1 ;; 3]], "MaxTallies" -> 16]
These commands extend the data and the corresponding column-name-to-index association.
ms = DateValue[Table[DateObject[{2016, i, 1}], {i, 12}], "MonthName"];
dvals = Map[{ms[[#]], "Q" <> ToString[Quotient[#, 4] + 1]} &, ToExpression @ dvals[[All, 1]]];
dataM = MapThread[Join[#1, #2] &, {data, dvals}];
aColumnNamesM =
Join[aColumnNames, <|"MonthName" -> (Length[aColumnNames] + 1), "QuarterNameShort" -> (Length[aColumnNames] + 2)|>];
(* {0.054877,Null} *)
Here is the contingency table for "Country" vs "QuarterNameShort" over "Quantity".
ctRes = CrossTabulate[ dataM[[All, aColumnNamesM /@ {"Country", "QuarterNameShort", "Quantity"}]]];
Magnify[#, 0.75] &@ MatrixForm[#1, TableHeadings -> {#2, #3}] & @@ ctRes
Often when making contingency tables over subsets of the data we obtain contingency tables with different rows and columns. For various reasons (programming, esthetics, comprehension) it is better to have the tables with the same rows and columns.
Here is an example of non-uniform contingency tables derived from the online retail data of the previous section. We split the data over the countries and find contingency tables of "MonthName" vs "QuarterNameShort" over "Quantity".
tbs = Association @
Map[
(xtab = CrossTabulate[#[[All, aColumnNamesM /@ {"MonthName", "QuarterNameShort", "Quantity"}]]];
#[[1, aColumnNamesM["Country"]]] -> xtab) &,
GatherBy[ dataM, #[[aColumnNamesM[ "Country"]]] &]];
Magnify[#, 0.75] &@
Map[MatrixForm[#["Matrix"], TableHeadings -> (# /@ {"RowNames", "ColumnNames"})] &, tbs](*[[{1,12,-1}]]*)
Using the object RSparseMatrix, see [4,5], we can impose row and column names on each table.
First we convert the contingency tables into RSparseMatrix objects:
tbs2 = Map[ ToRSparseMatrix[#["Matrix"], "RowNames" -> #["RowNames"], "ColumnNames" -> #["ColumnNames"]] &, tbs];
And then we impose the desired row and column names:
tbs2 = Map[ ImposeColumnNames[ ImposeRowNames[#, {"January", "December"}], {"Q1", "Q2", "Q3", "Q4"}] &, tbs2];
Magnify[#, 0.75] &@(MatrixForm /@ tbs2)
A generalization of CrossTabulate
is the function CrossTensorate
implemented in [1] that takes a "formula" argument similar to R’s xtabs
.
This finds number of people of different sub-groups of Titanic data:
ctRes = CrossTensorate[Count == "passenger survival" + "passenger sex" + "passenger class", titanicData, aTitanicColumnNames];
MatrixForm[ctRes]
We can verify the result using Count:
Count[titanicData, {"1st", _, "female", "died"}]
(* 5 *)
Count[titanicData, {"2nd", _, "male", "survived"}]
(* 23 *)
To split the cross-tensor across its first variable we can use this command:
sctRes = Association@
MapThread[Rule[#1, Join[<|"XTABTensor" -> #2|>, Rest@Rest@ctRes]] &, {ctRes[[2]], # & /@ ctRes["XTABTensor"]}];
MatrixForm /@ sctRes
Or we can call the more general function CrossTensorateSplit
implemented in [1]:
Map[MatrixForm /@ CrossTensorateSplit[ctRes, #] &, Rest@Keys[ctRes]]
CrossTensorateSplit
can also be called with one argument that is a variable name.This will produce a splitting function. For example, the above command can be re-written as :
Map[MatrixForm /@ CrossTensorateSplit[#] @ ctRes &, Rest@Keys[ctRes]]
[1] Anton Antonov, MathematicaForPrediction utilities, (2014), source code MathematicaForPrediction at GitHub, package MathematicaForPredictionUtilities.m.
[2] Anton Antonov, Mosaic plot for data visualization implementation in Mathematica, (2014), MathematicaForPrediction at GitHub, package MosaicPlot.m.
[3] Anton Antonov, "Mosaic plots for data visualization", (2014), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2014/03/17/mosaic-plots-for-data-visualization/ .
[4] Anton Antonov, RSparseMatrix Mathematica package, (2015) MathematicaForPrediction at GitHub. URL: https://github.com/antononcube/MathematicaForPrediction/blob/master/Misc/RSparseMatrix.m .
[5] Anton Antonov, "RSparseMatrix for sparse matrices with named rows and columns", (2015), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2015/10/08/rsparsematrix-for-sparse-matrices-with-named-rows-and-columns/ .
[6] Daqing Chen, Online Retail Data Set, (2015), UCI Machine Learning Repository. URL: https://archive.ics.uci.edu/ml/datasets/Online+Retail .
In this document are given outlines and examples of several related implementations of Lebesgue integration, [1], within the framework of NIntegrate
, [7]. The focus is on the implementations of Lebesgue integration algorithms that have multiple options and can be easily extended (in order to do further research, optimization, etc.) In terms of NIntegrate
‘s framework terminology it is shown how to implement an integration strategy or integration rule based on the theory of the Lebesgue integral. The full implementation of those strategy and rules — LebesgueIntegration
, LebesgueIntegrationRule
, and GridLebesgueIntegrationRule
— are given in the Mathematica package [5].
The advantage of using NIntegrate
‘s framework is that a host of supporting algorithms can be employed for preprocessing, execution, experimentation, and testing (correctness, comparison, and profiling.)
Here is a brief description of the integration strategy LebesgueIntegration
in [5]:
use NIntegrate
for the computation of one dimensional integrals for that measure estimate function over the range of the integrand function values.
The strategy is adaptive because of the second step — NIntegrate
uses adaptive integration algorithms.
Instead of using an integration strategy we can "tuck in" the whole Lebesgue integration process into an integration rule, and then use that integration rule with the adaptive integration algorithms NIntegrate
already has. This is done with the implementations of the integration rules LebesgueIntegrationRule
and GridLebesgueIntegrationRule
.
Lebesgue integration extends the definition of integral to a much larger class of functions than the class of Riemann integrable functions. The Riemann integral is constructed by partitioning the integrand’s domain (on the axis). The Lebesgue integral is constructed by partitioning the integrand’s co-domain (on the axis). For each value in the co-domain, find the measure of the corresponding set of points in the domain. Roughly speaking, the Lebesgue integral is then the sum of all the products ; see [1]. For our implementation purposes is defined differently, and in the rest of this section we follow [3].
Consider the non-negative bound-able measurable function :
We denote by the measure for the points in for which , i.e.
The Lebesgue integral of over can be be defined as:
Further, we can write the last formula as
The restriction can be handled by defining the following functions and :
and using the formula
Since finding analytical expressions of is hard we are going to look into ways of approximating .
For more details see [1,2,3,4].
(Note, that the theoretical outline the algorithms considered can be seen as algorithms that reduce multidimensional integration to one dimensional integration.)
We can see that because of Equation (4) we mostly have to focus on estimating the measure function . This section provides a walk through with visual examples of a couple of stochastic ways to do that.
Consider the integral
In order to estimate in for we are going to generate in a set of low discrepancy sequence of points, [2]. Here this is done with points of the so called "Sobol" sequence:
n = 100;
SeedRandom[0,Method -> {"MKL",Method -> {"Sobol", "Dimension" -> 2}}];
points = RandomReal[{0, 1}, {n, 2}];
ListPlot[points, AspectRatio -> Automatic,
PlotTheme -> "Detailed",
FrameLabel -> {"\!\(\*SubscriptBox[\(x\), \(1\)]\)","\!\(\*SubscriptBox[\(x\), \(2\)]\)"}]
To each point let us assign a corresponding "volume" that can be used to approximate with Equation (2). We can of course easily assign such volumes to be , but as it can be seen on the plot this would be a good approximation for a larger number of points. Here is an example of a different volume assignment using a Voronoi diagram, [10]:
vmesh = VoronoiMesh[points, {{0, 1}, {0, 1}}, Frame -> True];
Show[{vmesh, Graphics[{Red, Point[points]}]},
FrameLabel -> {"\!\(\*SubscriptBox[\(x\), \(1\)]\)", "\!\(\*SubscriptBox[\(x\), \(2\)]\)"}]
(The Voronoi diagram for finds for each point the set of domain points closest to than any other point of .)
Here is a breakdown of the Voronoi diagram volumes corresponding to the generated points (compare with ) :
volumes =PropertyValue[{vmesh, Dimensions[points][[2]]}, MeshCellMeasure];
Histogram[volumes, PlotRange -> All, PlotTheme -> "Detailed",
FrameLabel -> {"volume", "count"}]
Let us define a function that computes according to Equation (1) with the generated points and assigned volumes:
EstimateMeasure[fval_?NumericQ, pointVals_, pointVolumes_] :=
Block[{pinds},
pinds = Clip[Sign[pointVals - fval], {0, 1}, {0, 1}];
pointVolumes.pinds
];
Here is an example call of that function using the Voronoi diagram volumes:
EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points, volumes]
(* 0.845833 *)
And here is another call using uniform volumes:
EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points,
Table[1/Length[points], {Length[points]}]] // N
(* 0.85 *)
The results can be verified using ImplicitRegion
:
RegionMeasure[ImplicitRegion[ Sqrt[2 + x1 + x2] >= 1.6, {{x1, 0, 1}, {x2, 0, 1}}]]
(* 0.8432 *)
Or using Integrate
:
Integrate[Piecewise[{{1, Sqrt[2 + x1 + x2] >= 1.6}}, 0], {x1, 0, 1}, {x2, 0, 1}]
(* 0.8432 *)
At this point we are ready to compute the integral estimate using Formula (4) :
fvals = Sqrt[2 + Total[#]] & /@ points;
Min[fvals]*EstimateMeasure[0, fvals, volumes] +
NIntegrate[EstimateMeasure[y, fvals, volumes], {y, Min[fvals], Max[fvals]}]
(* 1.72724 *)
To simplify the code we use the symbol to hold the values . Note that instead of the true min and max values of we use estimates of them with .
Here is the verification of the result:
Integrate[Sqrt[2 + x1 + x2], {x1, 0, 1}, {x2, 0, 1}]
% // N
(* 8/15 (16 + 2 Sqrt[2] - 9 Sqrt[3]) *)
(* 1.72798 *)
In order to implement the outlined algorithm so it will be more universal we have to consider volumes rescaling, function positivity, and Voronoi diagram implementation(s). For details how these considerations are resolved see the code of the strategy LebesgueIntegration in [5].
The article 3 and book 4 suggest the measure estimation to be done through membership of regular grid of cells. For example, the points generated in the previous section can be grouped by a grid:
The following steps describe in detail an algorithm based on the proposed in [3,4] measure estimation method. The algorithm is implemented in [5] for the symbol, GridLebesgueIntegrationRule
.
1. Generate points filling the where is the dimension of .
2. Partition with a regular grid according to specifications.
3. For each point find to which cell of the regular grid it belongs to.
4. For each cell have a list of indices corresponding to the points that belong to it.
5. For a given sub-region of integration rescale to the points to lie within ; denote those points with .
6. For a given integrand function evaluate over all points .
7. For each cell find the min and max values of .
8. For a given value , where is some integer enumerating the 1D integration rule sampling points, find the coefficients , using the following formula:
9. Find the measure estimate of with
The implementations of Lebesgue integration rules are required to provide a splitting axis for use of the adaptive algorithms. Of course we can assign a random splitting axis, but that might lead often to slower computations. One way to provide splitting axis is to choose the axis that minimizes the sum of the variances of sub-divided regions estimated by samples of the rule points. This is the same approach taken in NIntegrate`s rule "MonteCarloRule"; for theoretical details see the chapter "7.8 Adaptive and Recursive Monte Carlo Methods" of [11].
In [5] this splitting axis choice is implemented based on integrand function values. Another approach, more in the spirit of the Lebesgue integration, is to select the splitting axis based on variances of the measure function estimates.
Consider the function:
DensityPlot[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
PlotRange -> All, ColorFunction -> "TemperatureMap"]
Here is an example of sampling points traces using "minimum variance" axis splitting in LebesgueIntegrationRule
:
res = Reap@
NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
Method -> {"GlobalAdaptive",
Method -> {LebesgueIntegrationRule, "Points" -> 600,
"PointGenerator" -> "Sobol",
"AxisSelector" -> "MinVariance"},
"SingularityHandler" -> None},
EvaluationMonitor :> Sow[{x, y}],
PrecisionGoal -> 2.5, MaxRecursion -> 3];
res[[1]]
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.890916 *)
And here are the sampling points with random selection of a splitting axis:
res = Reap@
NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
Method -> {"GlobalAdaptive",
Method -> {LebesgueIntegrationRule, "Points" -> 600,
"PointGenerator" -> "Sobol",
"AxisSelector" -> Random},
"SingularityHandler" -> None},
EvaluationMonitor :> Sow[{x, y}],
PrecisionGoal -> 2.5, MaxRecursion -> 3];
res[[1]]
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.892499 *)
Here is a more precise estimate of that integral:
NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3}]
(* 0.898306 *)
The strategy and rule implementations in [5] can be used in the following ways.
NIntegrate[Sqrt[x], {x, 0, 2}, Method -> LebesgueIntegration]
(* 1.88589 *)
NIntegrate[Sqrt[x], {x, 0, 2},
Method -> {LebesgueIntegration, Method -> "LocalAdaptive",
"Points" -> 2000, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 1.88597 *)
NIntegrate[Sqrt[x], {x, 0, 2},
Method -> {LebesgueIntegrationRule, "Points" -> 2000,
"PointGenerator" -> "Sobol",
"PointwiseMeasure" -> "VoronoiMesh"}, PrecisionGoal -> 3]
(* 1.88597 *)
NIntegrate[Sqrt[x + y + x], {x, 0, 2}, {y, 0, 3}, {z, 0, 4},
Method -> {GridLebesgueIntegrationRule,
Method -> "GaussKronrodRule", "Points" -> 2000,
"GridSizes" -> 5, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 43.6364 *)
Here are the options for the implemented strategy and rules in [5]:
Grid[Transpose[{#, ColumnForm[Options[#]]} & /@
{LebesgueIntegration,LebesgueIntegrationRule,
GridLebesgueIntegrationRule}],
Alignment -> Left, Dividers -> All]
Integration with variable ranges works "out of the box."
NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x}]
(* 2.75817 *)
NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
Method -> LebesgueIntegration, PrecisionGoal -> 2]
(* 2.75709 *)
NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
Method -> LebesgueIntegrationRule, PrecisionGoal -> 2]
(* 2.75663 *)
In order to get correct results with infinite ranges the wrapper
Method->{"UnitCubeRescaling","FunctionalRangesOnly"->False, _}
has to be used. Here is an example:
NIntegrate[1/(x^2 + 12), {x, 0, Infinity}]
(* 0.45345 *)
NIntegrate[1/(x^2 + 12), {x, 0, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 2000}}, PrecisionGoal -> 3]
(* 0.453366 *)
For some integrands we have to specify inter-range points or larger MinRecursion
.
NIntegrate[1/(x^2), {x, 1, Infinity}]
(* 1. *)
NIntegrate[1/(x^2), {x, 1, 12, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0.999466 *)
NIntegrate[1/(x^2), {x, 1, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0. *)
With the option "EvaluationMonitor" we can see the sampling points for the strategy and the rules.
This is straightforward for the rules:
res = Reap@
NIntegrate[Exp[-3 (x - 1)^2], {x, 0, 3},
Method -> LebesgueIntegrationRule,
EvaluationMonitor :> Sow[x],
PrecisionGoal -> 3];
ListPlot[res[[2, 1]]]
The strategy LebesgueIntegration
uses an internal variable for the calculation of the Lebesgue integral. In "EvaluationMonitor" either that variable has to be used, or a symbol name has to be passed though the option "LebesgueIntegralVariableSymbol". Here is an example:
res = Reap@
NIntegrate[Sqrt[x + y + z], {x, -1, 2}, {y, 0, 1}, {z, 1, 12},
Method -> {LebesgueIntegration, "Points" -> 600,
"PointGenerator" -> "Sobol",
"LebesgueIntegralVariableSymbol" -> fval},
EvaluationMonitor :> {Sow[fval]},
PrecisionGoal -> 3];
res = DeleteCases[res, fval, \[Infinity]];
ListPlot[res[[2, 1]]]
We can use NIntegrate
‘s utility functions for visualization and profiling in order to do comparison of the implemented algorithms with related ones (like "AdaptiveMonteCarlo") which NIntegrate
has (or are plugged-in).
Needs["Integration`NIntegrateUtilities`"]
Common function and domain:
fexpr = 1/(375/100 - Cos[x] - Cos[y]);
ranges = {{x, 0, \[Pi]}, {y, 0, \[Pi]}};
Common parameters:
pgen = "Random";
npoints = 1000;
"AdaptiveMonteCarlo" call:
NIntegrateProfile[
NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
Method -> {"AdaptiveMonteCarlo",
Method -> {"MonteCarloRule", "PointGenerator" -> pgen,
"Points" -> npoints,
"AxisSelector" -> "MinVariance"}},
PrecisionGoal -> 3]
]
(* {"IntegralEstimate" -> InputForm[2.8527356472097902`], "Evaluations" -> 17000, "Timing" -> 0.0192079} *)
LebesgueIntegrationRule
call:
NIntegrateProfile[
NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
Method -> {"GlobalAdaptive",
Method -> {LebesgueIntegrationRule,
"PointGenerator" -> pgen, "Points" -> npoints,
"AxisSelector" -> "MinVariance",
Method -> "ClenshawCurtisRule"},
"SingularityHandler" -> None}, PrecisionGoal -> 3]
]
(* {"IntegralEstimate" -> InputForm[2.836659588960318], "Evaluations" -> 13000, "Timing" -> 0.384246} *)
The flow charts below show that the plug-in designs have common elements. In order to make the computations more effective the rule initialization prepares the data that is used in all rule invocations. For the strategy the initialization can be much lighter since the strategy algorithm is executed only once.
In the flow charts the double line boxes designate sub-routines. We can see that so called Hollywood principle "don’t call us, we’ll call you" in Object-oriented programming is manifested.
The following flow chart shows the steps of NIntegrate
‘s execution when the integration strategy LebesgueIntegration is used.
The following flow chart shows the steps of NIntegrate
‘s execution when the integration rule LebesgueIntegrationRule
is used.
There are multiple architectural, coding, and interface decisions to make while doing implementations like the ones in [5] and described in this document. The following mind map provides an overview of alternatives and interactions between components and parameters.
In many ways using the Lebesgue integration rule with the adaptive algorithms is similar to using NIntegrate
‘s "AdaptiveMonteCarlo" and its rule "MonteCarloRule". Although it is natural to consider plugging-in the Lebesgue integration rules into "AdaptiveMonteCarlo" at this point NIntegrate
‘s framework does not allow "AdaptiveMonteCarlo" the use of a rule different than "MonteCarloRule".
We can consider using Monte Carlo algorithms for estimating the measures corresponding to a vector of values (that come from a 1D integration rule). This can be easily done, but it is not that effective because of the way NIntegrate
handles vector integrands and because of stopping criteria issues when the measures are close to .
One of the most interesting extensions of the described Lebesgue integration algorithms and implementation designs is their extension with more advanced features of Mathematica for geometric computation. (Like the functions VoronoiMesh
, RegionMeasure
, and ImplicitRegion
used above.)
Another interesting direction is the derivation and use of symbolic expressions for the measure functions. (Hybrid symbolic and numerical algorithms can be obtained as NIntegrate
‘s handling of piecewise functions or the strategy combining symbolic and numerical integration described in [9].)
[1] Wikipedia entry, Lebesgue integration, URL: https://en.wikipedia.org/wiki/Lebesgue_integration .
[2] Wikipedia entry, Low-discrepancy sequence, URL: https://en.wikipedia.org/wiki/Low-discrepancy_sequence .
[3] B. L. Burrows, A new approach to numerical integration, 1. Inst. Math. Applics., 26(1980), 151-173.
[4] T. He, Dimensionality Reducing Expansion of Multivariate Integration, 2001, Birkhauser Boston. ISBN-13:978-1-4612-7414-8 .
[5] A. Antonov, Adaptive Numerical Lebesgue Integration Mathematica Package, 2016, MathematicaForPrediction project at GitHub.
[6] A. Antonov, Lebesgue integration, Wolfram Demonstrations Project, 2007. URL: http://demonstrations.wolfram.com/LebesgueIntegration .
[7] "Advanced Numerical Integration in the Wolfram Language", Wolfram Research Inc. URL: https://reference.wolfram.com/language/tutorial/NIntegrateOverview.html .
[8] A. Antonov, "How to implement custom integration rules for use by NIntegrate?", (2016) Mathematica StackExchange answer, URL: http://mathematica.stackexchange.com/a/118326/34008 .
[9] A. Antonov, "How to implement custom NIntegrate integration strategies?", (2016) Mathematica StackExchange answer, URL: http://mathematica.stackexchange.com/a/118922/34008 .
[10] Wikipedia entry, Voronoi diagram, URL: https://en.wikipedia.org/wiki/Voronoi_diagram .
[11] Press, W.H. et al., Numerical Recipes in C (2nd Ed.): The Art of Scientific Computing, 1992, Cambridge University Press, New York, NY, USA.