Comparison of dimension reduction algorithms over mandala images generation

Introduction

This document discusses concrete algorithms for two different approaches of generation of mandala images, [1]: direct construction with graphics primitives, and use of machine learning algorithms.

In the experiments described in this document better results were obtained with the direct algorithms. The direct algorithms were made for the Mathematica StackExchange question "Code that generates a mandala", [3].

The main goals of this document are:

  1. to show some pretty images exploiting symmetry and multiplicity (see this album),

  2. to provide an illustrative example of comparing dimension reduction methods,

  3. to give a set-up for further discussions and investigations on mandala creation with machine learning algorithms.

Two direct construction algorithms are given: one uses "seed" segment rotations, the other superimposing of layers of different types. The following plots show the order in which different mandala parts are created with each of the algorithms.

"Direct-Mandala-creation-algorithms-steps"

In this document we use several algorithms for dimension reduction applied to collections of images following the procedure described in [4,5]. We are going to show that with Non-Negative Matrix Factorization (NNMF) we can use mandalas made with the seed segment rotation algorithm to extract layer types and superimpose them to make colored mandalas. Using the same approach with Singular Value Decomposition (SVD) or Independent Component Analysis (ICA) does not produce good layers and the superimposition produces more "watered-down", less diverse mandalas.

From a more general perspective this document compares the statistical approach of "trying to see without looking" with the "direct simulation" approach. Another perspective is the creation of "design spaces"; see [6].

The idea of using machine learning algorithms is appealing because there is no need to make the mental effort of understanding, discerning, approximating, and programming the principles of mandala creation. We can "just" use a large collection of mandala images and generate new ones using the "internal knowledge" data of machine learning algorithms. For example, a Neural network system like Deep Dream, [2], might be made to dream of mandalas.

Direct algorithms for mandala generation

In this section we present two different algorithms for generating mandalas. The first sees a mandala as being generated by rotation of a "seed" segment. The second sees a mandala as being generated by different component layers. For other approaches see [3].

The request of [3] is for generation of mandalas for coloring by hand. That is why the mandala generation algorithms are in the grayscale space. Coloring the generated mandala images is a secondary task.

By seed segment rotations

One way to come up with mandalas is to generate a segment and then by appropriate number of rotations to produce a mandala.

Here is a function and an example of random segment (seed) generation:

Clear[MakeSeedSegment]
MakeSeedSegment[radius_, angle_, n_Integer: 10, 
   connectingFunc_: Polygon, keepGridPoints_: False] :=
  Block[{t},
   t = Table[
     Line[{radius*r*{Cos[angle], Sin[angle]}, {radius*r, 0}}], {r, 0, 1, 1/n}];
   Join[If[TrueQ[keepGridPoints], t, {}], {GrayLevel[0.25], 
     connectingFunc@RandomSample[Flatten[t /. Line[{x_, y_}] :> {x, y}, 1]]}]
   ];

seed = MakeSeedSegment[10, Pi/12, 10];
Graphics[seed, Frame -> True]
"Mandala-seed-segment"

This function can make a seed segment symmetric:

Clear[MakeSymmetric]
MakeSymmetric[seed_] := {seed, 
   GeometricTransformation[seed, ReflectionTransform[{0, 1}]]};

seed = MakeSymmetric[seed];
Graphics[seed, Frame -> True]
"Mandala-seed-segment-symmetric"

Using a seed we can generate mandalas with different specification signatures:

Clear[MakeMandala]
MakeMandala[opts : OptionsPattern[]] :=      
  MakeMandala[
   MakeSymmetric[
    MakeSeedSegment[20, Pi/12, 12, 
     RandomChoice[{Line, Polygon, BezierCurve, 
       FilledCurve[BezierCurve[#]] &}], False]], Pi/6, opts];

MakeMandala[seed_, angle_?NumericQ, opts : OptionsPattern[]] :=      
  Graphics[GeometricTransformation[seed, 
    Table[RotationMatrix[a], {a, 0, 2 Pi - angle, angle}]], opts];

This code randomly selects symmetricity and seed generation parameters (number of concentric circles, angles):

SeedRandom[6567]
n = 12;
Multicolumn@
 MapThread[
  Image@If[#1,
     MakeMandala[MakeSeedSegment[10, #2, #3], #2],
     MakeMandala[
      MakeSymmetric[MakeSeedSegment[10, #2, #3, #4, False]], 2 #2]
     ] &, {RandomChoice[{False, True}, n], 
   RandomChoice[{Pi/7, Pi/8, Pi/6}, n], 
   RandomInteger[{8, 14}, n], 
   RandomChoice[{Line, Polygon, BezierCurve, 
     FilledCurve[BezierCurve[#]] &}, n]}]
"Seed-segment-rotation-mandalas-complex-settings"

Here is a more concise way to generate symmetric segment mandalas:

Multicolumn[Table[Image@MakeMandala[], {12}], 5]
"Seed-segment-rotation-mandalas-simple-settings"

Note that with this approach the programming of the mandala coloring is not that trivial — weighted blending of colorized mandalas is the easiest thing to do. (Shown below.)

By layer types

This approach was given by Simon Woods in [3].

"For this one I’ve defined three types of layer, a flower, a simple circle and a ring of small circles. You could add more for greater variety."

The coloring approach with image blending given below did not work well for this algorithm, so I modified the original code in order to produce colored mandalas.

ClearAll[LayerFlower, LayerDisk, LayerSpots, MandalaByLayers]

LayerFlower[n_, a_, r_, colorSchemeInd_Integer] := 
  Module[{b = RandomChoice[{-1/(2 n), 0}]}, {If[
     colorSchemeInd == 0, White, 
     RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
    Cases[ParametricPlot[
      r (a + Cos[n t])/(a + 1) {Cos[t + b Sin[2 n t]], Sin[t + b Sin[2 n t]]}, {t, 0, 2 Pi}], 
     l_Line :> FilledCurve[l], -1]}];

LayerDisk[_, _, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White, 
    RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
   Disk[{0, 0}, r]};

LayerSpots[n_, a_, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White, 
    RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
   Translate[Disk[{0, 0}, r a/(4 n)], r CirclePoints[n]]};

MandalaByLayers[n_, m_, coloring : (False | True) : False, opts : OptionsPattern[]] := 
  Graphics[{EdgeForm[Black], White, 
    Table[RandomChoice[{3, 2, 1} -> {LayerFlower, LayerDisk, LayerSpots}][n, RandomReal[{3, 5}], i, 
       If[coloring, RandomInteger[{1, 17}], 0]]~Rotate~(Pi i/n), {i, m, 1, -1}]}, opts];

Here are generated black-and-white mandalas.

SeedRandom[6567]
ImageCollage[Table[Image@MandalaByLayers[16, 20], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]
"Layer-types-superimposing-BW"

Here are some colored mandalas. (Which make me think more of Viking and Native American art than mandalas.)

ImageCollage[Table[Image@MandalaByLayers[16, 20, True], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]
"Layer-types-superimposing-colored"

Training data

Images by direct generation

iSize = 400;

SeedRandom[6567]
AbsoluteTiming[
 mandalaImages = 
   Table[Image[
     MakeMandala[
      MakeSymmetric@
       MakeSeedSegment[10, Pi/12, 12, RandomChoice[{Polygon, FilledCurve[BezierCurve[#]] &}]], Pi/6], 
     ImageSize -> {iSize, iSize}, ColorSpace -> "Grayscale"], {300}];
 ]

(* {39.31, Null} *)

ImageCollage[ColorNegate /@ RandomSample[mandalaImages, 12], Background -> White, ImagePadding -> 3, ImageSize -> 400]
"mandalaImages-sample"

External image data

See the section "Using World Wide Web images".

Direct blending

The most interesting results are obtained with the image blending procedure coded below over mandala images generated with the seed segment rotation algorithm.

SeedRandom[3488]
directBlendingImages = Table[
   RemoveBackground@
    ImageAdjust[
     Blend[Colorize[#, 
         ColorFunction -> 
          RandomChoice[{"IslandColors", "FruitPunchColors", 
            "AvocadoColors", "Rainbow"}]] & /@ 
       RandomChoice[mandalaImages, 4], RandomReal[1, 4]]], {36}];

ImageCollage[directBlendingImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]

"directBlendingImages-3488-36"

Dimension reduction algorithms application

In this section we are going to apply the dimension reduction algorithms Singular Value Decomposition (SVD), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF) to a linear vector space representation (a matrix) of an image dataset. In the next section we use the bases generated by those algorithms to make mandala images.
We are going to use the packages [7,8] for ICA and NNMF respectively.


Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/IndependentComponentAnalysis.m"]
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/NonNegativeMatrixFactorization.m"]

Linear vector space representation

The linear vector space representation of the images is simple — each image is flattened to a vector (row-wise), and the image vectors are put into a matrix.

mandalaMat = Flatten@*ImageData@*ColorNegate /@ mandalaImages;
Dimensions[mandalaMat]

(* {300, 160000} *)

Re-factoring and basis images

The following code re-factors the images matrix with SVD, ICA, and NNMF and extracts the basis images.

AbsoluteTiming[
 svdRes = SingularValueDecomposition[mandalaMat, 20];
]
(* {5.1123, Null} *)

svdBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose@svdRes[[3]]];

AbsoluteTiming[
 icaRes = 
   IndependentComponentAnalysis[Transpose[mandalaMat], 20, 
    PrecisionGoal -> 4, "MaxSteps" -> 100];
]
(* {23.41, Null} *)

icaBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose[icaRes[[1]]]];

SeedRandom[452992]
AbsoluteTiming[
 nnmfRes = 
   GDCLS[mandalaMat, 20, PrecisionGoal -> 4, 
    "MaxSteps" -> 20, "RegularizationParameter" -> 0.1];
 ]
(* {233.209, Null} *)

nnmfBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, nnmfRes[[2]]];

Bases

Let us visualize the bases derived with the matrix factorization methods.

Grid[{{"SVD", "ICA", "NNMF"},
      Map[ImageCollage[#, Automatic, {400, 500}, 
        Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &, 
      {svdBasisImages, icaBasisImages, nnmfBasisImages}]
     }, Dividers -> All]
"Mandala-SVD-ICA-NNMF-bases-20"

"Mandala-SVD-ICA-NNMF-bases-20"

Here are some observations for the bases.

  1. The SVD basis has an average mandala image as its first vector and the other vectors are "differences" to be added to that first vector.

  2. The SVD and ICA bases are structured similarly. That is because ICA and SVD are both based on orthogonality — ICA factorization uses an orthogonality criteria based on Gaussian noise properties (which is more relaxed than SVD’s standard orthogonality criteria.)

  3. As expected, the NNMF basis images have black background because of the enforced non-negativity. (Black corresponds to 0, white to 1.)

  4. Compared to the SVD and ICA bases the images of the NNMF basis are structured in a radial manner. This can be demonstrated using image binarization.

Grid[{{"SVD", "ICA", "NNMF"}, Map[ImageCollage[Binarize[#, 0.5] & /@ #, Automatic, {400, 500}, Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &, {svdBasisImages, icaBasisImages, nnmfBasisImages}] }, Dividers -> All]
"Mandala-SVD-ICA-NNMF-bases-binarized-0.5-20"

We can see that binarizing of the NNMF basis images shows them as mandala layers. In other words, using NNMF we can convert the mandalas of the seed segment rotation algorithm into mandalas generated by an algorithm that superimposes layers of different types.

Blending with image bases samples

In this section we just show different blending images using the SVD, ICA, and NNMF bases.

Blending function definition

ClearAll[MandalaImageBlending]
Options[MandalaImageBlending] = {"BaseImage" -> {}, "BaseImageWeight" -> Automatic, "PostBlendingFunction" -> (RemoveBackground@*ImageAdjust)};
MandalaImageBlending[basisImages_, nSample_Integer: 4, n_Integer: 12, opts : OptionsPattern[]] :=      
  Block[{baseImage, baseImageWeight, postBlendingFunc, sImgs, sImgWeights},
   baseImage = OptionValue["BaseImage"];
   baseImageWeight = OptionValue["BaseImageWeight"];
   postBlendingFunc = OptionValue["PostBlendingFunction"];
   Table[(
     sImgs = 
      Flatten@Join[{baseImage}, RandomSample[basisImages, nSample]];
     If[NumericQ[baseImageWeight] && ImageQ[baseImage],
      sImgWeights = 
       Join[{baseImageWeight}, RandomReal[1, Length[sImgs] - 1]],
      sImgWeights = RandomReal[1, Length[sImgs]]
      ];
     postBlendingFunc@
      Blend[Colorize[#, 
          DeleteCases[{opts}, ("BaseImage" -> _) | ("BaseImageWeight" -> _) | ("PostBlendingFunction" -> _)],               
          ColorFunction -> 
           RandomChoice[{"IslandColors", "FruitPunchColors", 
             "AvocadoColors", "Rainbow"}]] & /@ sImgs, 
       sImgWeights]), {n}]
   ];

SVD image basis blending

SeedRandom[17643]
svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]

"svdBlendedImages-17643-24"

SeedRandom[17643]
svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24, "BaseImage" -> First[svdBasisImages], "BaseImageWeight" -> 0.5];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]

"svdBlendedImages-baseImage-17643-24"

ICA image basis blending

SeedRandom[17643]
icaBlendedImages = MandalaImageBlending[Rest[icaBasisImages], 4, 36, "BaseImage" -> First[icaBasisImages], "BaseImageWeight" -> Automatic];
ImageCollage[icaBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]

"icaBlendedImages-17643-36"

NNMF image basis blending

SeedRandom[17643]
nnmfBlendedImages = MandalaImageBlending[nnmfBasisImages, 4, 36];
ImageCollage[nnmfBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]

"nnmfBlendedImages-17643-36"

Using World Wide Web images

A natural question to ask is:

What would be the outcomes of the above procedures to mandala images found in the World Wide Web (WWW) ?

Those WWW images are most likely man made or curated.

The short answer is that the results are not that good. Better results might be obtained using a larger set of WWW images (than just 100 in the experiment results shown below.)

Here is a sample from the WWW mandala images:

"wwwMandalaImages-sample-6

Here are the results obtained with NNMF basis:

"www-nnmfBlendedImages-12"

Future plans

My other motivation for writing this document is to set up a basis for further investigations and discussions on the following topics.

  1. Having a large image database of "real world", human made mandalas.

  2. Utilization of Neural Network algorithms to mandala creation.

  3. Utilization of Cellular Automata to mandala generation.

  4. Investigate mandala morphing and animations.

  5. Making a domain specific language of specifications for mandala creation and modification.

The idea of using machine learning algorithms for mandala image generation was further supported by an image classifier that recognizes fairly well (suitably normalized) mandala images obtained in different ways:

"Mandalas-classifer-measurements-matrix"

References

[1] Wikipedia entry: Mandala, https://en.wikipedia.org/wiki/Mandala .

[2] Wikipedia entry: DeepDream, https://en.wikipedia.org/wiki/DeepDream .

[3] "Code that generates a mandala", Mathematica StackExchange, http://mathematica.stackexchange.com/q/136974 .

[4] Anton Antonov, "Comparison of PCA and NNMF over image de-noising", (2016), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2016/05/07/comparison-of-pca-and-nnmf-over-image-de-noising/ .

[5] Anton Antonov, "Handwritten digits recognition by matrix factorization", (2016), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2016/11/12/handwritten-digits-recognition-by-matrix-factorization/ .

[6] Chris Carlson, "Social Exploration of Design Spaces: A Proposal", (2016), Wolfram Technology Conference 2016. URL: http://wac .36f4.edgecastcdn.net/0036F4/pub/www.wolfram.com/technology-conference/2016/SocialExplorationOfDesignSpaces.nb , YouTube: https://www.youtube.com/watch?v=YK2523nfcms .

[7] Anton Antonov, Independent Component Analysis Mathematica package, (2016), source code at MathematicaForPrediction at GitHub, package IndependentComponentAnalysis.m .

[8] Anton Antonov, Implementation of the Non-Negative Matrix Factorization algorithm in Mathematica, (2013), source code at MathematicaForPrediction at GitHub, package NonNegativeMatrixFactorization.m.

Tries with frequencies in Java

Introduction

This blog post describes the installation and use in Mathematica of Tries with frequencies [1] implemented in Java [2] through a corresponding Mathematica package [3].

Prefix tree or Trie, [6], is a tree data structure that stores a set of "words" that consist of "characters" — each element can be seen as a key to itself. The article [1] and packages [2,3,4] extend that data structure to have additional data (frequencies) associated with each key.

The packages [2,3] work with lists of strings only. The package [4] can work with more general data but it is much slower.

The main motivation to create the package [3] was to bring the fast Trie functions implementations of [2] into Mathematica in order to prototype, implement, and experiment with different text processing algorithms. (Like, inductive grammar parsers generation and entity name recognition.) The speed of combining [2] and [3] is evaluated in the section "Performance tests" below.

Set-up

This following directory path has to have the jar file "TriesWithFrequencies.jar".

$JavaTriesWithFrequenciesPath = 
  "/Users/antonov/MathFiles/MathematicaForPrediction/Java/TriesWithFrequencies";
FileExistsQ[
 FileNameJoin[{$JavaTriesWithFrequenciesPath, "TriesWithFrequencies.jar"}]]

(* True *)

For more details see the explanations in the README file in the GitHub directory of [2].

The following directory is expected to have the Mathematica package [3].

dirName = "/Users/antonov/MathFiles/MathematicaForPrediction";
FileExistsQ[FileNameJoin[{dirName, "JavaTriesWithFrequencies.m"}]]

(* True *)

AppendTo[$Path, dirName];
Needs["JavaTriesWithFrequencies`"]

This commands installs Java (via JLink`) and loads the necessary Java libraries.

JavaTrieInstall[$JavaTriesWithFrequenciesPath]

Basic examples

For brevity the basic examples are not included in this blog post. Here is album of images that shows the "JavaTrie.*" commands with their effects:

"JavaTrieExample" .

More detailed explanations can be found in the Markdown document, [7]:

Next, we are going to look into performance evaluation examples (also given in [7].)

Membership of words

Assume we want find the words of "Hamlet" that are not in the book "Origin of Species". This section shows that the Java trie creation and query times for this task are quite small.

Read words

The following code reads the words in the texts. We get 33000 words from "Hamlet" and 151000 words from "Origin of Species".

hWords =
  Block[{words},
   words = 
    StringSplit[
     ExampleData[{"Text", "Hamlet"}], {Whitespace, 
      PunctuationCharacter}];
   words = Select[ToLowerCase[words], StringLength[#] > 0 &]
   ];
Length[hWords]

(* 32832 *)

osWords =
  Block[{words},
   words = 
    StringSplit[
     ExampleData[{"Text", "OriginOfSpecies"}], {Whitespace, 
      PunctuationCharacter}];
   words = Select[ToLowerCase[words], StringLength[#] > 0 &]
   ];
Length[osWords]

(* 151205 *)

Membership

First we create trie with "Origin of species" words:

AbsoluteTiming[
 jOStr = JavaTrieCreateBySplit[osWords];
]

(* {0.682531, Null} *)

Sanity check — the "Origin of species" words are in the trie:

AbsoluteTiming[
 And @@ JavaObjectToExpression[
   JavaTrieContains[jOStr, Characters /@ osWords]]
]

(* {1.32224, True} *)

Membership of "Hamlet" words into "Origin of Species":

AbsoluteTiming[
 res = JavaObjectToExpression[
    JavaTrieContains[jOStr, Characters /@ hWords]];
]

(* {0.265307, Null} *)

Tallies of belonging:

Tally[res]

(* {{True, 24924}, {False, 7908}} *)

Sample of words from "Hamlet" that do not belong to "Origin of Species":

RandomSample[Pick[hWords, Not /@ res], 30]

(* {"rosencrantz", "your", "mar", "airy", "rub", "honesty", \
"ambassadors", "oph", "returns", "pale", "virtue", "laertes", \
"villain", "ham", "earnest", "trail", "unhand", "quit", "your", \
"your", "fishmonger", "groaning", "your", "wake", "thou", "liest", \
"polonius", "upshot", "drowned", "grosser"} *)

Common words sample:

RandomSample[Pick[hWords, res], 30]

(* {"as", "indeed", "it", "with", "wild", "will", "to", "good", "so", \
"dirt", "the", "come", "not", "or", "but", "the", "why", "my", "to", \
"he", "and", "you", "it", "to", "potent", "said", "the", "are", \
"question", "soft"} *)

Statistics

The node counts statistics calculation is fast:

AbsoluteTiming[
 JavaTrieNodeCounts[jOStr]
]

(* {0.002344, <|"total" -> 20723, "internal" -> 15484, "leaves" -> 5239|>} *)

The node counts statistics computation after shrinking is comparably fast :

AbsoluteTiming[
 JavaTrieNodeCounts[JavaTrieShrink[jOStr]]
]

(* {0.00539, <|"total" -> 8918,  "internal" -> 3679, "leaves" -> 5239|>} *)

The conversion of a large trie to JSON and computing statistics over the obtained tree is reasonably fast:

AbsoluteTiming[
 res = JavaTrieToJSON[jOStr];
]

(* {0.557221, Null} *)

AbsoluteTiming[
 Quantile[
  Cases[res, ("value" -> v_) :> v, \[Infinity]], 
  Range[0, 1, 0.1]]
]

(* {0.019644, {1., 1., 1., 1., 2., 3., 5., 9., 17., 42., 151205.}} *)

Dictionary infixes

Get all words from a dictionary:

allWords =  DictionaryLookup["*"];
allWords // Length

(* 92518 *)

Trie creation and shrinking:

AbsoluteTiming[
 jDTrie = JavaTrieCreateBySplit[allWords];
 jDShTrie = JavaTrieShrink[jDTrie];
]

(* {0.30508, Null} *)

JSON form extraction:

AbsoluteTiming[
 jsonRes = JavaTrieToJSON[jDShTrie];
]

(* {3.85955, Null} *)

Here are the node statistics of the original and shrunk tries:

"Orginal-trie-vs-Shrunk-trie-Node-Counts"

Find the infixes that have more than three characters and appear more than 10 times:

Multicolumn[#, 4] &@
 Select[SortBy[
   Tally[Cases[
     jsonRes, ("key" -> v_) :> v, Infinity]], -#[[-1]] &], StringLength[#[[1]]] > 3 && #[[2]] > 10 &]
"Long-infixes-in-shrunk-dictionary-trie"

Unit tests

Many of example shown in this document have corresponding tests in the file JavaTriesWithFrequencies-Unit-Tests.wlt hosted at GitHub.

tr = TestReport[
  dirName <> "/UnitTests/JavaTriesWithFrequencies-Unit-Tests.wlt"]
"TestReport"

References

[1] Anton Antonov, "Tries with frequencies for data mining", (2013), MathematicaForPrediction at WordPress blog. URL: https://mathematicaforprediction.wordpress.com/2013/12/06/tries-with-frequencies-for-data-mining/ .

[2] Anton Antonov, Tries with frequencies in Java, (2017), source code at MathematicaForPrediction at GitHub, project Java/TriesWithFrequencies.

[3] Anton Antonov, Java tries with frequencies Mathematica package, (2017), source code at MathematicaForPrediction at GitHub, package JavaTriesWithFrequencies.m .

[4] Anton Antonov, Tries with frequencies Mathematica package, (2013), source code at MathematicaForPrediction at GitHub, package TriesWithFrequencies.m .

[5] Anton Antonov, Java tries with frequencies Mathematica unit tests, (2017), source code at MathematicaForPrediction at GitHub, unit tests file JavaTriesWithFrequencies-Unit-Tests.wlt .

[6] Wikipedia, Trie, https://en.wikipedia.org/wiki/Trie .

[7] Anton Antonov, "Tries with frequencies in Java", (2017), MathematicaForPrediction at GitHub.

Handwritten digits recognition by matrix factorization

Introduction

This MathematicaVsR at GitHub project is for comparing Mathematica and R for the tasks of classifier creation, execution, and evaluation using the MNIST database of images of handwritten digits.

Here are the bases built with two different classifiers:

  • Singular Value Decomposition (SVD)

SVD-basis-for-5

  • Non-Negative Matrix Factorization (NNMF)

NNMF-basis-for-5

Here are the confusion matrices of the two classifiers:

  • SVD

SVD-confusion-matrix

  • NNMF

NNMF-confusion-matrix

The blog post "Classification of handwritten digits" (published 2013) has a related more elaborated discussion over a much smaller database of handwritten digits.

Concrete steps

The concrete steps taken in scripts and documents of this project follow.

  1. Ingest the binary data files into arrays that can be visualized as digit images.
  • We have two sets: 60,000 training images and 10,000 testing images.
  1. Make a linear vector space representation of the images by simple unfolding.

  2. For each digit find the corresponding representation matrix and factorize it.

  3. Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)

  • One of the matrix factors is seen as a new basis.
  1. For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.

  2. Evaluate the classifier(s) over all test images and compute accuracy, F-Scores, and other measures.

Scripts

There are scripts going through the steps listed above:

Documents

The following documents give expositions that are suitable for reading and following of steps and corresponding results.

Observations

Ingestion

I figured out first in R how to ingest the data in the binary files of the MNIST database. There were at least several online resources (blog posts, GitHub repositories) that discuss the MNIST binary files ingestion.

After that making the corresponding code in Mathematica was easy.

Classification results

Same in Mathematica and R for for SVD and NNMF. (As expected.)

NNMF

NNMF classifiers use the MathematicaForPrediction at GitHub implementations: NonNegativeMatrixFactorization.m and NonNegativeMatrixFactorization.R.

Parallel computations

Both Mathematica and R have relatively simple set-up of parallel computations.

Graphics

It was not very straightforward to come up in R with visualizations for MNIST images. The Mathematica visualization is much more flexible when it comes to plot labeling.

Going further

Comparison with other classifiers

Using Mathematica’s built-in classifiers it was easy to compare the SVD and NNMF classifiers with neural network ones and others. (The SVD and NNMF are much faster to built and they bring comparable precision.)

It would be nice to repeat that in R using one or several of the neural network classifiers provided by Google, Microsoft, H2O, Baidu, etc.

Classifier ensembles

Another possible extension is to use classifier ensembles and Receiver Operation Characteristic (ROC) to create better classifiers. (Both in Mathematica and R.)

Importance of variables

Using classifier agnostic importance of variables procedure we can figure out :

  • which NNMF basis vectors (images) are most important for the classification precision,

  • which image rows or columns are most important for each digit, or similarly

  • which image squares of a, say, 4×4 image grid are most important.