Comparison of dimension reduction algorithms over mandala images generation


This document discusses concrete algorithms for two different approaches of generation of mandala images, [1]: direct construction with graphics primitives, and use of machine learning algorithms.

In the experiments described in this document better results were obtained with the direct algorithms. The direct algorithms were made for the Mathematica StackExchange question "Code that generates a mandala", [3].

The main goals of this document are:

  1. to show some pretty images exploiting symmetry and multiplicity (see this album),

  2. to provide an illustrative example of comparing dimension reduction methods,

  3. to give a set-up for further discussions and investigations on mandala creation with machine learning algorithms.

Two direct construction algorithms are given: one uses "seed" segment rotations, the other superimposing of layers of different types. The following plots show the order in which different mandala parts are created with each of the algorithms.


In this document we use several algorithms for dimension reduction applied to collections of images following the procedure described in [4,5]. We are going to show that with Non-Negative Matrix Factorization (NNMF) we can use mandalas made with the seed segment rotation algorithm to extract layer types and superimpose them to make colored mandalas. Using the same approach with Singular Value Decomposition (SVD) or Independent Component Analysis (ICA) does not produce good layers and the superimposition produces more "watered-down", less diverse mandalas.

From a more general perspective this document compares the statistical approach of "trying to see without looking" with the "direct simulation" approach. Another perspective is the creation of "design spaces"; see [6].

The idea of using machine learning algorithms is appealing because there is no need to make the mental effort of understanding, discerning, approximating, and programming the principles of mandala creation. We can "just" use a large collection of mandala images and generate new ones using the "internal knowledge" data of machine learning algorithms. For example, a Neural network system like Deep Dream, [2], might be made to dream of mandalas.

Direct algorithms for mandala generation

In this section we present two different algorithms for generating mandalas. The first sees a mandala as being generated by rotation of a "seed" segment. The second sees a mandala as being generated by different component layers. For other approaches see [3].

The request of [3] is for generation of mandalas for coloring by hand. That is why the mandala generation algorithms are in the grayscale space. Coloring the generated mandala images is a secondary task.

By seed segment rotations

One way to come up with mandalas is to generate a segment and then by appropriate number of rotations to produce a mandala.

Here is a function and an example of random segment (seed) generation:

MakeSeedSegment[radius_, angle_, n_Integer: 10, 
   connectingFunc_: Polygon, keepGridPoints_: False] :=
   t = Table[
     Line[{radius*r*{Cos[angle], Sin[angle]}, {radius*r, 0}}], {r, 0, 1, 1/n}];
   Join[If[TrueQ[keepGridPoints], t, {}], {GrayLevel[0.25], 
     connectingFunc@RandomSample[Flatten[t /. Line[{x_, y_}] :> {x, y}, 1]]}]

seed = MakeSeedSegment[10, Pi/12, 10];
Graphics[seed, Frame -> True]

This function can make a seed segment symmetric:

MakeSymmetric[seed_] := {seed, 
   GeometricTransformation[seed, ReflectionTransform[{0, 1}]]};

seed = MakeSymmetric[seed];
Graphics[seed, Frame -> True]

Using a seed we can generate mandalas with different specification signatures:

MakeMandala[opts : OptionsPattern[]] :=      
    MakeSeedSegment[20, Pi/12, 12, 
     RandomChoice[{Line, Polygon, BezierCurve, 
       FilledCurve[BezierCurve[#]] &}], False]], Pi/6, opts];

MakeMandala[seed_, angle_?NumericQ, opts : OptionsPattern[]] :=      
    Table[RotationMatrix[a], {a, 0, 2 Pi - angle, angle}]], opts];

This code randomly selects symmetricity and seed generation parameters (number of concentric circles, angles):

n = 12;
     MakeMandala[MakeSeedSegment[10, #2, #3], #2],
      MakeSymmetric[MakeSeedSegment[10, #2, #3, #4, False]], 2 #2]
     ] &, {RandomChoice[{False, True}, n], 
   RandomChoice[{Pi/7, Pi/8, Pi/6}, n], 
   RandomInteger[{8, 14}, n], 
   RandomChoice[{Line, Polygon, BezierCurve, 
     FilledCurve[BezierCurve[#]] &}, n]}]

Here is a more concise way to generate symmetric segment mandalas:

Multicolumn[Table[Image@MakeMandala[], {12}], 5]

Note that with this approach the programming of the mandala coloring is not that trivial — weighted blending of colorized mandalas is the easiest thing to do. (Shown below.)

By layer types

This approach was given by Simon Woods in [3].

"For this one I’ve defined three types of layer, a flower, a simple circle and a ring of small circles. You could add more for greater variety."

The coloring approach with image blending given below did not work well for this algorithm, so I modified the original code in order to produce colored mandalas.

ClearAll[LayerFlower, LayerDisk, LayerSpots, MandalaByLayers]

LayerFlower[n_, a_, r_, colorSchemeInd_Integer] := 
  Module[{b = RandomChoice[{-1/(2 n), 0}]}, {If[
     colorSchemeInd == 0, White, 
     RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
      r (a + Cos[n t])/(a + 1) {Cos[t + b Sin[2 n t]], Sin[t + b Sin[2 n t]]}, {t, 0, 2 Pi}], 
     l_Line :> FilledCurve[l], -1]}];

LayerDisk[_, _, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White, 
    RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
   Disk[{0, 0}, r]};

LayerSpots[n_, a_, r_, colorSchemeInd_Integer] := {If[colorSchemeInd == 0, White, 
    RandomChoice[ColorData[colorSchemeInd, "ColorList"]]], 
   Translate[Disk[{0, 0}, r a/(4 n)], r CirclePoints[n]]};

MandalaByLayers[n_, m_, coloring : (False | True) : False, opts : OptionsPattern[]] := 
  Graphics[{EdgeForm[Black], White, 
    Table[RandomChoice[{3, 2, 1} -> {LayerFlower, LayerDisk, LayerSpots}][n, RandomReal[{3, 5}], i, 
       If[coloring, RandomInteger[{1, 17}], 0]]~Rotate~(Pi i/n), {i, m, 1, -1}]}, opts];

Here are generated black-and-white mandalas.

ImageCollage[Table[Image@MandalaByLayers[16, 20], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]

Here are some colored mandalas. (Which make me think more of Viking and Native American art than mandalas.)

ImageCollage[Table[Image@MandalaByLayers[16, 20, True], {12}], Background -> White, ImagePadding -> 3, ImageSize -> 1200]

Training data

Images by direct generation

iSize = 400;

 mandalaImages = 
       MakeSeedSegment[10, Pi/12, 12, RandomChoice[{Polygon, FilledCurve[BezierCurve[#]] &}]], Pi/6], 
     ImageSize -> {iSize, iSize}, ColorSpace -> "Grayscale"], {300}];

(* {39.31, Null} *)

ImageCollage[ColorNegate /@ RandomSample[mandalaImages, 12], Background -> White, ImagePadding -> 3, ImageSize -> 400]

External image data

See the section "Using World Wide Web images".

Direct blending

The most interesting results are obtained with the image blending procedure coded below over mandala images generated with the seed segment rotation algorithm.

directBlendingImages = Table[
         ColorFunction -> 
          RandomChoice[{"IslandColors", "FruitPunchColors", 
            "AvocadoColors", "Rainbow"}]] & /@ 
       RandomChoice[mandalaImages, 4], RandomReal[1, 4]]], {36}];

ImageCollage[directBlendingImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]


Dimension reduction algorithms application

In this section we are going to apply the dimension reduction algorithms Singular Value Decomposition (SVD), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF) to a linear vector space representation (a matrix) of an image dataset. In the next section we use the bases generated by those algorithms to make mandala images.
We are going to use the packages [7,8] for ICA and NNMF respectively.


Linear vector space representation

The linear vector space representation of the images is simple — each image is flattened to a vector (row-wise), and the image vectors are put into a matrix.

mandalaMat = Flatten@*ImageData@*ColorNegate /@ mandalaImages;

(* {300, 160000} *)

Re-factoring and basis images

The following code re-factors the images matrix with SVD, ICA, and NNMF and extracts the basis images.

 svdRes = SingularValueDecomposition[mandalaMat, 20];
(* {5.1123, Null} *)

svdBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose@svdRes[[3]]];

 icaRes = 
   IndependentComponentAnalysis[Transpose[mandalaMat], 20, 
    PrecisionGoal -> 4, "MaxSteps" -> 100];
(* {23.41, Null} *)

icaBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, Transpose[icaRes[[1]]]];

 nnmfRes = 
   GDCLS[mandalaMat, 20, PrecisionGoal -> 4, 
    "MaxSteps" -> 20, "RegularizationParameter" -> 0.1];
(* {233.209, Null} *)

nnmfBasisImages = Map[ImageAdjust@Image@Partition[#, iSize] &, nnmfRes[[2]]];


Let us visualize the bases derived with the matrix factorization methods.

Grid[{{"SVD", "ICA", "NNMF"},
      Map[ImageCollage[#, Automatic, {400, 500}, 
        Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &, 
      {svdBasisImages, icaBasisImages, nnmfBasisImages}]
     }, Dividers -> All]


Here are some observations for the bases.

  1. The SVD basis has an average mandala image as its first vector and the other vectors are "differences" to be added to that first vector.

  2. The SVD and ICA bases are structured similarly. That is because ICA and SVD are both based on orthogonality — ICA factorization uses an orthogonality criteria based on Gaussian noise properties (which is more relaxed than SVD’s standard orthogonality criteria.)

  3. As expected, the NNMF basis images have black background because of the enforced non-negativity. (Black corresponds to 0, white to 1.)

  4. Compared to the SVD and ICA bases the images of the NNMF basis are structured in a radial manner. This can be demonstrated using image binarization.

Grid[{{"SVD", "ICA", "NNMF"}, Map[ImageCollage[Binarize[#, 0.5] & /@ #, Automatic, {400, 500}, Background -> LightBlue, ImagePadding -> 5, ImageSize -> 350] &, {svdBasisImages, icaBasisImages, nnmfBasisImages}] }, Dividers -> All]

We can see that binarizing of the NNMF basis images shows them as mandala layers. In other words, using NNMF we can convert the mandalas of the seed segment rotation algorithm into mandalas generated by an algorithm that superimposes layers of different types.

Blending with image bases samples

In this section we just show different blending images using the SVD, ICA, and NNMF bases.

Blending function definition

Options[MandalaImageBlending] = {"BaseImage" -> {}, "BaseImageWeight" -> Automatic, "PostBlendingFunction" -> (RemoveBackground@*ImageAdjust)};
MandalaImageBlending[basisImages_, nSample_Integer: 4, n_Integer: 12, opts : OptionsPattern[]] :=      
  Block[{baseImage, baseImageWeight, postBlendingFunc, sImgs, sImgWeights},
   baseImage = OptionValue["BaseImage"];
   baseImageWeight = OptionValue["BaseImageWeight"];
   postBlendingFunc = OptionValue["PostBlendingFunction"];
     sImgs = 
      Flatten@Join[{baseImage}, RandomSample[basisImages, nSample]];
     If[NumericQ[baseImageWeight] && ImageQ[baseImage],
      sImgWeights = 
       Join[{baseImageWeight}, RandomReal[1, Length[sImgs] - 1]],
      sImgWeights = RandomReal[1, Length[sImgs]]
          DeleteCases[{opts}, ("BaseImage" -> _) | ("BaseImageWeight" -> _) | ("PostBlendingFunction" -> _)],               
          ColorFunction -> 
           RandomChoice[{"IslandColors", "FruitPunchColors", 
             "AvocadoColors", "Rainbow"}]] & /@ sImgs, 
       sImgWeights]), {n}]

SVD image basis blending

svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]


svdBlendedImages = MandalaImageBlending[Rest@svdBasisImages, 4, 24, "BaseImage" -> First[svdBasisImages], "BaseImageWeight" -> 0.5];
ImageCollage[svdBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]


ICA image basis blending

icaBlendedImages = MandalaImageBlending[Rest[icaBasisImages], 4, 36, "BaseImage" -> First[icaBasisImages], "BaseImageWeight" -> Automatic];
ImageCollage[icaBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]


NNMF image basis blending

nnmfBlendedImages = MandalaImageBlending[nnmfBasisImages, 4, 36];
ImageCollage[nnmfBlendedImages, Background -> White, ImagePadding -> 3, ImageSize -> 1200]


Using World Wide Web images

A natural question to ask is:

What would be the outcomes of the above procedures to mandala images found in the World Wide Web (WWW) ?

Those WWW images are most likely man made or curated.

The short answer is that the results are not that good. Better results might be obtained using a larger set of WWW images (than just 100 in the experiment results shown below.)

Here is a sample from the WWW mandala images:


Here are the results obtained with NNMF basis:


Future plans

My other motivation for writing this document is to set up a basis for further investigations and discussions on the following topics.

  1. Having a large image database of "real world", human made mandalas.

  2. Utilization of Neural Network algorithms to mandala creation.

  3. Utilization of Cellular Automata to mandala generation.

  4. Investigate mandala morphing and animations.

  5. Making a domain specific language of specifications for mandala creation and modification.

The idea of using machine learning algorithms for mandala image generation was further supported by an image classifier that recognizes fairly well (suitably normalized) mandala images obtained in different ways:



[1] Wikipedia entry: Mandala, .

[2] Wikipedia entry: DeepDream, .

[3] "Code that generates a mandala", Mathematica StackExchange, .

[4] Anton Antonov, "Comparison of PCA and NNMF over image de-noising", (2016), MathematicaForPrediction at WordPress blog. URL: .

[5] Anton Antonov, "Handwritten digits recognition by matrix factorization", (2016), MathematicaForPrediction at WordPress blog. URL: .

[6] Chris Carlson, "Social Exploration of Design Spaces: A Proposal", (2016), Wolfram Technology Conference 2016. URL: http://wac , YouTube: .

[7] Anton Antonov, Independent Component Analysis Mathematica package, (2016), source code at MathematicaForPrediction at GitHub, package IndependentComponentAnalysis.m .

[8] Anton Antonov, Implementation of the Non-Negative Matrix Factorization algorithm in Mathematica, (2013), source code at MathematicaForPrediction at GitHub, package NonNegativeMatrixFactorization.m.

Adaptive numerical Lebesgue integration by set measure estimates


In this document are given outlines and examples of several related implementations of Lebesgue integration, [1], within the framework of NIntegrate, [7]. The focus is on the implementations of Lebesgue integration algorithms that have multiple options and can be easily extended (in order to do further research, optimization, etc.) In terms of NIntegrate‘s framework terminology it is shown how to implement an integration strategy or integration rule based on the theory of the Lebesgue integral. The full implementation of those strategy and rules — LebesgueIntegration, LebesgueIntegrationRule, and GridLebesgueIntegrationRule — are given in the Mathematica package [5].

The advantage of using NIntegrate‘s framework is that a host of supporting algorithms can be employed for preprocessing, execution, experimentation, and testing (correctness, comparison, and profiling.)

Here is a brief description of the integration strategy LebesgueIntegration in [5]:

  1. prepare a function that calculates measure estimates based on random points or low discrepancy sequences of points in the integration domain;

  2. use NIntegrate for the computation of one dimensional integrals for that measure estimate function over the range of the integrand function values.

The strategy is adaptive because of the second step — NIntegrate uses adaptive integration algorithms.

Instead of using an integration strategy we can "tuck in" the whole Lebesgue integration process into an integration rule, and then use that integration rule with the adaptive integration algorithms NIntegrate already has. This is done with the implementations of the integration rules LebesgueIntegrationRule and GridLebesgueIntegrationRule.

Brief theory

Lebesgue integration extends the definition of integral to a much larger class of functions than the class of Riemann integrable functions. The Riemann integral is constructed by partitioning the integrand’s domain (on the x axis). The Lebesgue integral is constructed by partitioning the integrand’s co-domain (on the y axis). For each value y in the co-domain, find the measure \mu(y) of the corresponding set of points \frac{y}{f} in the domain. Roughly speaking, the Lebesgue integral is then the sum of all the products \mu(y) y; see [1]. For our implementation purposes \mu is defined differently, and in the rest of this section we follow [3].

Consider the non-negative bound-able measurable function f:

 y=f(x), f(x) \geq 0, x \in \Omega .

We denote by \mu(y) the measure for the points in \Omega for which f(x)>=y, i.e.

  1.  \mu(y) := \left| \{ x: x\in \Omega \land f(x) \geq y\} \right| .

The Lebesgue integral of f(x) over \Omega can be be defined as:

 \int_{\Omega } f (x) dx = y_0 \mu (y_0) + \lim_{n \to \infty ,\max
         \left(y_i-y_{i-1}\right)\to 0} \sum_{i=1}^n \mu(y_i)(y_i-y_{i-1}) .

Further, we can write the last formula as

  1.  \int_{\Omega } f(x) dx = y_0 \mu(y_0) + \int_{y_0}^{y_n}\mu(y) dy.

The restriction f(x)>=0 can be handled by defining the following functions f_1 and f_2 :

 f_1(x) := \frac{1}{2} (\left| f(x) \right| + f(x) ),

 f_2(x) := \frac{1}{2} (\left| f(x) \right| - f(x) ),

and using the formula

 \int_{\Omega}f(x) dx= \int_{\Omega} f_1(x) dx - \int_{\Omega}f_2(x) dx.

Since finding analytical expressions of \mu is hard we are going to look into ways of approximating \mu.

For more details see [1,2,3,4].

(Note, that the theoretical outline the algorithms considered can be seen as algorithms that reduce multidimensional integration to one dimensional integration.)

Algorithm walk through

We can see that because of Equation (4) we mostly have to focus on estimating the measure function \mu. This section provides a walk through with visual examples of a couple of stochastic ways to do that.

Consider the integral

 \int_{0}^{1} \int_{0}^{1} \sqrt{x_1 + x_2 + 1} dx_1 dx_2 .

In order to estimate \mu in \Omega := [0,1] \times [0,1] for f(x_1, x_2) := \sqrt( 1 + x_1 + x_2 ) we are going to generate in \Omega a set P of low discrepancy sequence of points, [2]. Here this is done with 100 points of the so called "Sobol" sequence:

n = 100;
SeedRandom[0,Method -> {"MKL",Method -> {"Sobol", "Dimension" -> 2}}];
points = RandomReal[{0, 1}, {n, 2}];
ListPlot[points, AspectRatio -> Automatic,
 PlotTheme -> "Detailed",
 FrameLabel -> {"\!\(\*SubscriptBox[\(x\), \(1\)]\)","\!\(\*SubscriptBox[\(x\), \(2\)]\)"}]


To each point p_i let us assign a corresponding "volume" v_i that can be used to approximate \mu with Equation (2). We can of course easily assign such volumes to be \frac{1}{\left| P\right| }, but as it can be seen on the plot this would be a good approximation for a larger number of points. Here is an example of a different volume assignment using a Voronoi diagram, [10]:

vmesh = VoronoiMesh[points, {{0, 1}, {0, 1}}, Frame -> True];
Show[{vmesh, Graphics[{Red, Point[points]}]},
 FrameLabel -> {"\!\(\*SubscriptBox[\(x\), \(1\)]\)", "\!\(\*SubscriptBox[\(x\), \(2\)]\)"}]


(The Voronoi diagram for P finds for each point p_i the set of domain points closest to p_i than any other point of P.)

Here is a breakdown of the Voronoi diagram volumes corresponding to the generated points (compare with \frac{1}{\left| P\right| }) :

volumes =PropertyValue[{vmesh, Dimensions[points][[2]]}, MeshCellMeasure]; 
Histogram[volumes, PlotRange -> All, PlotTheme -> "Detailed",
 FrameLabel -> {"volume", "count"}]


Let us define a function that computes \mu according to Equation (1) with the generated points and assigned volumes:

EstimateMeasure[fval_?NumericQ, pointVals_, pointVolumes_] :=
   pinds = Clip[Sign[pointVals - fval], {0, 1}, {0, 1}];

Here is an example call of that function using the Voronoi diagram volumes:

EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points, volumes]
(* 0.845833 *)

And here is another call using uniform volumes:

EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points,
  Table[1/Length[points], {Length[points]}]] // N
(* 0.85 *)

The results can be verified using ImplicitRegion:

RegionMeasure[ImplicitRegion[ Sqrt[2 + x1 + x2] >= 1.6, {{x1, 0, 1}, {x2, 0, 1}}]]
(* 0.8432 *)

Or using Integrate:

Integrate[Piecewise[{{1, Sqrt[2 + x1 + x2] >= 1.6}}, 0], {x1, 0, 1}, {x2, 0, 1}]
(* 0.8432 *)

At this point we are ready to compute the integral estimate using Formula (4) :

fvals = Sqrt[2 + Total[#]] & /@ points;
Min[fvals]*EstimateMeasure[0, fvals, volumes] +
 NIntegrate[EstimateMeasure[y, fvals, volumes], {y, Min[fvals], Max[fvals]}]
(* 1.72724 *)

To simplify the code we use the symbol \text{fvals} to hold the values f P. Note that instead of the true min and max values of f we use estimates of them with \text{fvals}.

Here is the verification of the result:

Integrate[Sqrt[2 + x1 + x2], {x1, 0, 1}, {x2, 0, 1}]
% // N
(* 8/15 (16 + 2 Sqrt[2] - 9 Sqrt[3]) *)
(* 1.72798 *)

In order to implement the outlined algorithm so it will be more universal we have to consider volumes rescaling, function positivity, and Voronoi diagram implementation(s). For details how these considerations are resolved see the code of the strategy LebesgueIntegration in [5].

Further algorithm elaborations

Measure estimate with regular grid cells

The article 3 and book 4 suggest the measure estimation to be done through membership of regular grid of cells. For example, the 100 points generated in the previous section can be grouped by a 5 \times 5 grid:


The following steps describe in detail an algorithm based on the proposed in [3,4] measure estimation method. The algorithm is implemented in [5] for the symbol, GridLebesgueIntegrationRule.

1. Generate points filling the [0,1]^{d} where d is the dimension of \Omega.

2. Partition [0,1]^{d} with a regular grid according to specifications.

  • 2.1. Assume the cells are indexed with the integers I_c \subset \mathbb{N}, |I_c|=n.
  • 2.2. Assume that all cells have the same volume v_c below.

3. For each point find to which cell of the regular grid it belongs to.

4. For each cell have a list of indices corresponding to the points that belong to it.

5. For a given sub-region of integration r rescale to the points to lie within r; denote those points with P_r.

  • 5.1. Compute the rescaling factor for the integration rule; denote with \gamma.

6. For a given integrand function f evaluate f over all points P_r.

7. For each cell i \in I_c find the min and max values of f.

  • 7.1. Denote with f_i^{\min} and f_i^{\max} correspondingly.

8. For a given value f x_k, where k is some integer enumerating the 1D integration rule sampling points, find the coefficients \xi _i, i \in i_c using the following formula:

\xi_i= Piecewise[\{\{1, y_k \leq f_i^{\min}\},\{0, f_i^{\max }<y_k\}\},
 \{ \frac{Abs[f_i^{\max }-y_k] }{Abs[ f_i^{\max }-f_i^{\min} ]}, f_i^{\min }<y_k<f_i^{\max} \}]

9. Find the measure estimate of \mu((f(x)>f(x_k)) with

 \gamma \sum_{i=1}^{n} v_c \xi_i

Axis splitting in Lebesgue integration rules

The implementations of Lebesgue integration rules are required to provide a splitting axis for use of the adaptive algorithms. Of course we can assign a random splitting axis, but that might lead often to slower computations. One way to provide splitting axis is to choose the axis that minimizes the sum of the variances of sub-divided regions estimated by samples of the rule points. This is the same approach taken in NIntegrate`s rule "MonteCarloRule"; for theoretical details see the chapter "7.8 Adaptive and Recursive Monte Carlo Methods" of [11].

In [5] this splitting axis choice is implemented based on integrand function values. Another approach, more in the spirit of the Lebesgue integration, is to select the splitting axis based on variances of the measure function estimates.

Consider the function:

DensityPlot[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
 PlotRange -> All, ColorFunction -> "TemperatureMap"]


Here is an example of sampling points traces using "minimum variance" axis splitting in LebesgueIntegrationRule:

res = Reap@
   NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
    Method -> {"GlobalAdaptive",
      Method -> {LebesgueIntegrationRule, "Points" -> 600,
        "PointGenerator" -> "Sobol",
        "AxisSelector" -> "MinVariance"},
      "SingularityHandler" -> None},
    EvaluationMonitor :> Sow[{x, y}],
    PrecisionGoal -> 2.5, MaxRecursion -> 3];
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.890916 *)

MinVariance axis split

And here are the sampling points with random selection of a splitting axis:

res = Reap@
   NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
    Method -> {"GlobalAdaptive",
      Method -> {LebesgueIntegrationRule, "Points" -> 600,
        "PointGenerator" -> "Sobol",
        "AxisSelector" -> Random},
      "SingularityHandler" -> None},
    EvaluationMonitor :> Sow[{x, y}],
    PrecisionGoal -> 2.5, MaxRecursion -> 3];
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.892499 *)

Random axis split

Here is a more precise estimate of that integral:

NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3}]
(* 0.898306 *)

Implementation design within NIntegrate’s framework

Basic usages

The strategy and rule implementations in [5] can be used in the following ways.

NIntegrate[Sqrt[x], {x, 0, 2}, Method -> LebesgueIntegration]
(* 1.88589 *)

NIntegrate[Sqrt[x], {x, 0, 2},
 Method -> {LebesgueIntegration, Method -> "LocalAdaptive",
   "Points" -> 2000, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 1.88597 *)

NIntegrate[Sqrt[x], {x, 0, 2},
 Method -> {LebesgueIntegrationRule, "Points" -> 2000,
   "PointGenerator" -> "Sobol",
   "PointwiseMeasure" -> "VoronoiMesh"}, PrecisionGoal -> 3]
(* 1.88597 *)

NIntegrate[Sqrt[x + y + x], {x, 0, 2}, {y, 0, 3}, {z, 0, 4},
 Method -> {GridLebesgueIntegrationRule,
   Method -> "GaussKronrodRule", "Points" -> 2000,
   "GridSizes" -> 5, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 43.6364 *)


Here are the options for the implemented strategy and rules in [5]:

Grid[Transpose[{#, ColumnForm[Options[#]]} & /@
   Alignment -> Left, Dividers -> All]

Lebesgue methods options

Using variable ranges

Integration with variable ranges works "out of the box."

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x}]
(* 2.75817 *)

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
 Method -> LebesgueIntegration, PrecisionGoal -> 2]
(* 2.75709 *)

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
 Method -> LebesgueIntegrationRule, PrecisionGoal -> 2]
(* 2.75663 *)

Infinite ranges

In order to get correct results with infinite ranges the wrapper

Method->{"UnitCubeRescaling","FunctionalRangesOnly"->False, _}

has to be used. Here is an example:

NIntegrate[1/(x^2 + 12), {x, 0, Infinity}]
(* 0.45345 *)

NIntegrate[1/(x^2 + 12), {x, 0, Infinity},  
  Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False, 
    Method -> {LebesgueIntegrationRule, "Points" -> 2000}}, PrecisionGoal -> 3]
(* 0.453366 *)

For some integrands we have to specify inter-range points or larger MinRecursion.

NIntegrate[1/(x^2), {x, 1, Infinity}]
(* 1. *)

NIntegrate[1/(x^2), {x, 1, 12, Infinity}, 
  Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False, 
    Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0.999466 *)

NIntegrate[1/(x^2), {x, 1, Infinity}, 
  Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False, 
    Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0. *)

Evaluation monitoring

With the option "EvaluationMonitor" we can see the sampling points for the strategy and the rules.

This is straightforward for the rules:

res = Reap@
   NIntegrate[Exp[-3 (x - 1)^2], {x, 0, 3},
    Method -> LebesgueIntegrationRule,
    EvaluationMonitor :> Sow[x],
    PrecisionGoal -> 3];
ListPlot[res[[2, 1]]]

Lebesgue rule sampling points

The strategy LebesgueIntegration uses an internal variable for the calculation of the Lebesgue integral. In "EvaluationMonitor" either that variable has to be used, or a symbol name has to be passed though the option "LebesgueIntegralVariableSymbol". Here is an example:

res = Reap@
   NIntegrate[Sqrt[x + y + z], {x, -1, 2}, {y, 0, 1}, {z, 1, 12},
    Method -> {LebesgueIntegration, "Points" -> 600,
      "PointGenerator" -> "Sobol",
      "LebesgueIntegralVariableSymbol" -> fval},
    EvaluationMonitor :> {Sow[fval]},
    PrecisionGoal -> 3];
res = DeleteCases[res, fval, \[Infinity]];
ListPlot[res[[2, 1]]]

Lebesgue strategy sampling points


We can use NIntegrate‘s utility functions for visualization and profiling in order to do comparison of the implemented algorithms with related ones (like "AdaptiveMonteCarlo") which NIntegrate has (or are plugged-in).


Common function and domain:

fexpr = 1/(375/100 - Cos[x] - Cos[y]);
ranges = {{x, 0, \[Pi]}, {y, 0, \[Pi]}};

Common parameters:

pgen = "Random";
npoints = 1000;

"AdaptiveMonteCarlo" call:

 NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
  Method -> {"AdaptiveMonteCarlo",
    Method -> {"MonteCarloRule", "PointGenerator" -> pgen,
      "Points" -> npoints,
      "AxisSelector" -> "MinVariance"}},
  PrecisionGoal -> 3]
 (* {"IntegralEstimate" -> InputForm[2.8527356472097902`], "Evaluations" -> 17000, "Timing" -> 0.0192079} *)

LebesgueIntegrationRule call:

 NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
  Method -> {"GlobalAdaptive",
    Method -> {LebesgueIntegrationRule,
      "PointGenerator" -> pgen, "Points" -> npoints,
      "AxisSelector" -> "MinVariance",
      Method -> "ClenshawCurtisRule"},
    "SingularityHandler" -> None}, PrecisionGoal -> 3]
 (* {"IntegralEstimate" -> InputForm[2.836659588960318], "Evaluations" -> 13000, "Timing" -> 0.384246} *)

Design diagrams

The flow charts below show that the plug-in designs have common elements. In order to make the computations more effective the rule initialization prepares the data that is used in all rule invocations. For the strategy the initialization can be much lighter since the strategy algorithm is executed only once.

In the flow charts the double line boxes designate sub-routines. We can see that so called Hollywood principle "don’t call us, we’ll call you" in Object-oriented programming is manifested.

The following flow chart shows the steps of NIntegrate‘s execution when the integration strategy LebesgueIntegration is used.

Lebesgue integration rule flow strategy

The following flow chart shows the steps of NIntegrate‘s execution when the integration rule LebesgueIntegrationRule is used.

Lebesgue integration rule flow chart

Algorithms versions and options

There are multiple architectural, coding, and interface decisions to make while doing implementations like the ones in [5] and described in this document. The following mind map provides an overview of alternatives and interactions between components and parameters.

"Mind map"

Alternative implementations

In many ways using the Lebesgue integration rule with the adaptive algorithms is similar to using NIntegrate‘s "AdaptiveMonteCarlo" and its rule "MonteCarloRule". Although it is natural to consider plugging-in the Lebesgue integration rules into "AdaptiveMonteCarlo" at this point NIntegrate‘s framework does not allow "AdaptiveMonteCarlo" the use of a rule different than "MonteCarloRule".

We can consider using Monte Carlo algorithms for estimating the measures corresponding to a vector of values (that come from a 1D integration rule). This can be easily done, but it is not that effective because of the way NIntegrate handles vector integrands and because of stopping criteria issues when the measures are close to 0.

Future plans

One of the most interesting extensions of the described Lebesgue integration algorithms and implementation designs is their extension with more advanced features of Mathematica for geometric computation. (Like the functions VoronoiMesh, RegionMeasure, and ImplicitRegion used above.)

Another interesting direction is the derivation and use of symbolic expressions for the measure functions. (Hybrid symbolic and numerical algorithms can be obtained as NIntegrate‘s handling of piecewise functions or the strategy combining symbolic and numerical integration described in [9].)


[1] Wikipedia entry, Lebesgue integration, URL: .

[2] Wikipedia entry, Low-discrepancy sequence, URL: .

[3] B. L. Burrows, A new approach to numerical integration, 1. Inst. Math. Applics., 26(1980), 151-173.

[4] T. He, Dimensionality Reducing Expansion of Multivariate Integration, 2001, Birkhauser Boston. ISBN-13:978-1-4612-7414-8 .

[5] A. Antonov, Adaptive Numerical Lebesgue Integration Mathematica Package, 2016, MathematicaForPrediction project at GitHub.

[6] A. Antonov, Lebesgue integration, Wolfram Demonstrations Project, 2007. URL: .

[7] "Advanced Numerical Integration in the Wolfram Language", Wolfram Research Inc. URL: .

[8] A. Antonov, "How to implement custom integration rules for use by NIntegrate?", (2016) Mathematica StackExchange answer, URL: .

[9] A. Antonov, "How to implement custom NIntegrate integration strategies?", (2016) Mathematica StackExchange answer, URL: .

[10] Wikipedia entry, Voronoi diagram, URL: .

[11] Press, W.H. et al., Numerical Recipes in C (2nd Ed.): The Art of Scientific Computing, 1992, Cambridge University Press, New York, NY, USA.

Comparison of PCA, NNMF, and ICA over image de-noising


In a previous blog post, [1], I compared Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a collection of noised images of digit handwriting from the MNIST data set, [3], which is available in Mathematica.

This blog post adds to that comparison the use of Independent Component Analysis (ICA) proclaimed in my previous blog post, [1].


The ICA related additional computations to those in [1] follow.

First we load the package IndependentComponentAnalysis.m :


From that package we can use the function IndependentComponentAnalysis to find components:

{S, A} = IndependentComponentAnalysis[Transpose[noisyTestImagesMat], 9, PrecisionGoal -> 4.5];
Norm[noisyTestImagesMat - Transpose[S.A]]/Norm[noisyTestImagesMat]
(* 0.592739 *)

Let us visualize the norms of the mixing matrix A :

norms = Norm /@ A;
ListPlot[norms, PlotRange -> All, PlotLabel -> "Norms of A rows", 
    PlotTheme -> "Detailed"] // 
pos = OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters]

Norms of the mixing matrix

Next we can visualize the found "source" images:

ncol = 2;
 MapIndexed[{#2[[1]], Norm[#], 
   ImageAdjust@Image[Partition[#, dims[[1]]]]} &, 
 Transpose[S]] /. (# -> Style[#, Red] & /@ pos),
Table["", {ncol - QuotientRemainder[Dimensions[S][[2]], ncol][[2]]}]
], ncol], Dividers -> All]

ICA found source images of 6 and 7 images matrix

After selecting several of source images we zero the rest by modifying the matrix A:

pos = {6, 7, 9};
norms = Norm /@ A;
dA = DiagonalMatrix[
 ReplacePart[ConstantArray[0, Length[norms]], Map[List, pos] -> 1]];
newMatICA = 
 Transpose[Map[# + Mean[Transpose[noisyTestImagesMat]] &, S.dA.A]];
  denoisedImagesICA = Map[Image[Partition[#, dims[[2]]]] &, newMatICA];

Visual comparison of de-noised images

Next we visualize the ICA de-noised images together with the original images and the SVD and NNMF de-noised ones.

There are several ways to present that combination of images.

Grayscale 6 and 7 images, orginal, noised, PCA, NNMF, ICA de-noised

Binarized 6 and 7 images, orginal, noised, PCA, NNMF, ICA de-noised

Image collage of orginal, noised, PCA, NNMF, ICA de-noised 6 and 7 images

Comparison using classifiers

We can further compare the de-noising results by building digit classifiers and running them over the de-noised images.

icaCM = ClassifierMeasurements[digitCF, 
    Thread[(Binarize[#, 0.55] &@*ImageAdjust@*ColorNegate /@ 
        denoisedImagesICA) -> testImageLabels]]

We can see that with ICA we get better results than with PCA/SVD, probably not as good as NNMF, but very close.

Classifier comparison over PCA, NNMF, ICA de-noised images of 6 and 7

All images of this blog post

Computations results for ICA application of noised handwriting images of 6 an 7


[1] A. Antonov, "Comparison of PCA and NNMF over image de-noising", (2016), MathematicaForPrediction at WordPress.

[2] A. Antonov, "Independent Component Analysis for multidimensional signals", (2016), MathematicaForPrediction at WordPress.

[3] Wikipedia entry, MNIST database.

Independent component analysis for multidimensional signals


Independent Component Analysis (ICA) is a (matrix factorization) method for separation of a multi-dimensional signal (represented with a matrix) into a weighted sum of sub-components that have less entropy than the original variables of the signal. See [1,2] for introduction to ICA and more details.

This blog post is to proclaim the implementation of the "FastICA" algorithm in the package IndependentComponentAnalysis.m and show a basic application with it. (I programmed that package last weekend. It has been in my ToDo list to start ICA algorithms implementations for several months… An interesting offshoot was the procedure I derived for the StackExchange question "Extracting signal from Gaussian noise".)

In this blog post ICA is going to be demonstrated with both generated data and "real life" weather data (temperatures of three cities within one month).

Generated data

In order to demonstrate ICA let us make up some data in the spirit of the "cocktail party problem".

(*Signal functions*)
Clear[s1, s2, s3]
s1[t_] := Sin[600 \[Pi] t/10000 + 6*Cos[120 \[Pi] t/10000]] + 1.2
s2[t_] := Sin[\[Pi] t/10] + 1.2
s3[t_?NumericQ] := (((QuotientRemainder[t, 23][[2]] - 11)/9)^5 + 2.8)/2 + 0.2

(*Mixing matrix*)
A = {{0.44, 0.2, 0.31}, {0.45, 0.8, 0.23}, {0.12, 0.32, 0.71}};

(*Signals matrix*)
nSize = 600;
S = Table[{s1[t], s2[t], s3[t]}, {t, 0, nSize, 0.5}];

(*Mixed signals matrix*)
M = A.Transpose[S];

   Plot[#, {t, 0, nSize}, PerformanceGoal -> "Quality", 
     ImageSize -> 250] &, {s1[t], s2[t], s3[t]}]}]

Original signals

(*Mixed signals*)
Grid[{Map[ListLinePlot[#, ImageSize -> 250] &, M]}]

Mixed signals

I took the data generation formulas from [6].

ICA application

Load the package:


It is important to note that the usual ICA model interpretation for the factorized matrix X is that each column is a variable (audio signal) and each row is an observation (recordings of the microphones at a given time). The matrix 3×1201 M was constructed with the interpretation that each row is a signal, hence we have to transpose M in order to apply the ICA algorithms, X=M^T.

X = Transpose[M];

{S, A} = IndependentComponentAnalysis[X, 3];

Check the approximation of the obtained factorization:

Norm[X - S.A]    
(* 3.10715*10^-14 *)

Plot the found source signals:

Grid[{Map[ListLinePlot[#, PlotRange -> All, ImageSize -> 250] &, 

Found source signals

Because of the random initialization of the inverting matrix in the algorithm the result my vary. Here is the plot from another run:

Found source signals 2

The package also provides the function FastICA that returns an association with elements that correspond to the result of the function fastICA provided by the R package "fastICA". See [4].

Here is an example usage:

res = FastICA[X, 3];

(* {"X", "K", "W", "A", "S"} *)

   ListLinePlot[#, PlotRange -> All, ImageSize -> Medium] &, 

FastICA found source signals

Note that (in adherence to [4]) the function FastICA returns the matrices S and A for the centralized matrix X. This means, for example, that in order to check the approximation proper mean has to be supplied:

Norm[X - Map[# + Mean[X] &, res["S"].res["A"]]]
(* 2.56719*10^-14 *)

Signatures and results

The result of the function IndependentComponentAnalysis is a list of two matrices. The result of FastICA is an association of the matrices obtained by ICA. The function IndependentComponentAnalysis takes a method option and options for precision goal and maximum number of steps:

In[657]:= Options[IndependentComponentAnalysis]

Out[657]= {Method -> "FastICA", MaxSteps -> 200, PrecisionGoal -> 6}

The intent is IndependentComponentAnalysis to be the front interface to different ICA algorithms. (Hence, it has a Method option.) The function FastICA takes as options the named arguments of the R function fastICA described in [4].

In[658]:= Options[FastICA]

Out[658]= {"NonGaussianityFunction" -> Automatic, 
 "NegEntropyFactor" -> 1, "InitialUnmixingMartix" -> Automatic, 
 "RowNorm" -> False, MaxSteps -> 200, PrecisionGoal -> 6, 
 "RFastICAResult" -> True}

At this point FastICA has only the deflation algorithm described in [1]. ([4] has also the so called "symmetric" ICA sub-algorithm.) The R function fastICA in [4] can use only two neg-Entropy functions log(cosh(x)) and exp(-u^2/2). Because of the symbolic capabilities of Mathematica FastICA of [3] can take any listable function through the option "NonGaussianityFunction", and it will find and use the corresponding first and second derivatives.

Using NNMF for ICA

It seems that in some cases, like the generated data used in this blog post, Non-Negative Matrix Factorization (NNMF) can be applied for doing ICA.

To be clear, NNMF does dimension reduction, but its norm minimization process does not enforce variable independence. (It enforces non-negativity.) There are at least several articles discussing modification of NNMF to do ICA. For example [6].

Load NNMF package [5] (from MathematicaForPrediction at GitHub):


After several applications of NNMF we get signals close to the originals:

{W, H} = GDCLS[M, 3];
Grid[{Map[ListLinePlot[#, ImageSize -> 250] &, Normal[H]]}]

NNMF found source signals

For the generated data in this blog post, FactICA is much faster than NNMF and produces better separation of the signals with every run. The data though is a typical representative for the problems ICA is made for. Another comparison with image de-noising, extending my previous blog post, will be published shortly.

ICA for mixed time series of city temperatures

Using Mathematica‘s function WeatherData we can get temperature time series for a small set of cities over a certain time grid. We can mix those time series into a multi-dimensional signal, MS, apply ICA to MS, and judge the extracted source signals with the original ones.

This is done with the following commands.

Get time series data

cities = {"Sofia", "London", "Copenhagen"};
timeInterval = {{2016, 1, 1}, {2016, 1, 31}};
ts = WeatherData[#, "Temperature", timeInterval] & /@ cities;

opts = {PlotTheme -> "Detailed", FrameLabel -> {None, "temperature,\[Degree]C"}, ImageSize -> 350};
    PlotLabel -> "City temperatures\nfrom " <> DateString[timeInterval[[1]], {"Year", ".", "Month", ".", "Day"}] <> 
    " to " <> DateString[timeInterval[[2]], {"Year", ".", "Month", ".", "Day"}], 
    PlotLegends -> cities, ImageSize -> Large, opts]

City temperatures

Cleaning and resampling (if any)

Here we check the data for missing data:

Length /@ Through[ts["Path"]]
Count[#, _Missing, \[Infinity]] & /@ Through[ts["Path"]]
(* {1483, 1465, 742} *)
(* {0,0,0} *)
(* 0 *)

Resampling per hour:

ts = TimeSeriesResample[#, "Hour", ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}] & /@ ts

Mixing the time series

In order to do a good mixing we select a mixing matrix for which all column sums are close to one:

mixingMat = #/Total[#] & /@ RandomReal[1, {3, 3}];
(* mixingMat = {{0.357412, 0.403913, 0.238675}, {0.361481, 0.223506, 0.415013}, {0.36564, 0.278565, 0.355795}} *)
(* {1.08453, 0.905984, 1.00948} *)

Note the row normalization.

Make the mixed signals:

tsMixed = Table[TimeSeriesThread[mixingMat[[i]].# &, ts], {i, 3}]

Plot the original and mixed signals:

Grid[{{DateListPlot[ts, PlotLegends -> cities, PlotLabel -> "Original signals", opts],
DateListPlot[tsMixed, PlotLegends -> Automatic, PlotLabel -> "Mixed signals", opts]}}]

Original and mixed temperature signals

Application of ICA

At this point we apply ICA (probably more than once, but not too many times) and plot the found source signals:

X = Transpose[Through[tsMixed["Path"]][[All, All, 2]] /. Quantity[v_, _] :> v];
{S, A} = IndependentComponentAnalysis[X, 3];
DateListPlot[Transpose[{tsMixed[[1]]["Dates"], #}], PlotTheme -> "Detailed", ImageSize -> 250] & /@ Transpose[S]

ICA found temperature time series components

Compare with the original time series:

MapThread[DateListPlot[#1, PlotTheme -> "Detailed", PlotLabel -> #2, ImageSize -> 250] &, {tsPaths, cities}]

Original temperature time series

After permuting and inverting some of the found sources signals we see they are fairly good:

pinds = {3, 1, 2};
pmat = IdentityMatrix[3][[All, pinds]];

DateListPlot[Transpose[{tsMixed[[1]]["Dates"], #}], PlotTheme -> "Detailed", ImageSize -> 250] & /@ 
  Transpose[S.DiagonalMatrix[{1, -1, 1}].pmat]

Permuted and inverted found source signals


[1] A. Hyvarinen and E. Oja (2000) Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5):411-430 . URL: .

[2] Wikipedia entry, Independent component analysis, URL: .

[3] A. Antonov, Independent Component Analysis Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package IndependentComponentAnalysis.m .

[4] J. L. Marchini, C. Heaton and B. D. Ripley, fastICA, R package, URLs:, .

[5] A. Antonov, Implementation of the Non-Negative Matrix Factorization algorithm in Mathematica, (2013), source code at MathematicaForPrediction at GitHub, package NonNegativeMatrixFactorization.m.

[6] H. Hsieh and J. Chien, A new nonnegative matrix factorization for independent component analysis, (2010), Conference: Acoustics Speech and Signal Processing (ICASSP).

Comparison of PCA and NNMF over image de-noising

This post compares two dimension reduction techniques Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a set of images with two different classes of signals (two digits below) produced by different generators and overlaid with different types of noise.

PCA/SVD produces somewhat good results, but NNMF often provides great results. The factors of NNMF allow interpretation of the basis vectors of the dimension reduction, PCA in general does not. This can be seen in the example below.


First let us get some images. I am going to use the MNIST dataset for clarity. (And because I experimented with similar data some time ago — see Classification of handwritten digits.).

MNISTdigits = ExampleData[{"MachineLearning", "MNIST"}, "TestData"];
{testImages, testImageLabels} = 
  Transpose[List @@@ RandomSample[Cases[MNISTdigits, HoldPattern[(im_ -> 0 | 4)]], 100]];

enter image description here

See the breakdown of signal classes:

(* {{4, 48}, {0, 52}} *)

Verify the images have the same sizes:

Tally[ImageDimensions /@ testImages]
dims = %[[1, 1]]

Add different kinds of noise to the images:

noisyTestImages6 = 
    {RandomChoice[{"GaussianNoise", "PoissonNoise", "SaltPepperNoise"}], RandomReal[1]}], {i, Length[testImages]}];
RandomSample[Thread[{testImages, noisyTestImages6}], 15]

enter image description here

Since the important values of the signals are 0 or close to 0 we negate the noisy images:

negNoisyTestImages6 = ImageAdjust@*ColorNegate /@ noisyTestImages6

enter image description here

Linear vector space representation

We unfold the images into vectors and stack them into a matrix:

noisyTestImagesMat = (Flatten@*ImageData) /@ negNoisyTestImages6;

(* {100, 784} *)

Here is centralized version of the matrix to be used with PCA/SVD:

cNoisyTestImagesMat = 
  Map[# - Mean[noisyTestImagesMat] &, noisyTestImagesMat];

(With NNMF we want to use the non-centralized one.)

Here confirm the values in those matrices:

Grid[{Histogram[Flatten[#], 40, PlotRange -> All, 
     ImageSize -> Medium] & /@ {noisyTestImagesMat, 

enter image description here

SVD dimension reduction

For more details see the previous answers.

{U, S, V} = SingularValueDecomposition[cNoisyTestImagesMat, 100];
ListPlot[Diagonal[S], PlotRange -> All, PlotTheme -> "Detailed"]
dS = S;
Do[dS[[i, i]] = 0, {i, Range[10, Length[S], 1]}]
newMat = U.dS.Transpose[V];
denoisedImages = 
  Map[Image[Partition[# + Mean[noisyTestImagesMat], dims[[2]]]] &, newMat];

enter image description here

Here are how the new basis vectors look like:

Take[#, 50] &@
 MapThread[{#1, Norm[#2], 
    ImageAdjust@Image[Partition[Rescale[#3], dims[[1]]]]} &, {Range[
    Dimensions[V][[2]]], Diagonal[S], Transpose[V]}]

enter image description here

Basically, we cannot tell much from these SVD basis vectors images.

Load packages

Here we load the packages for what is computed next:



The blog post “Statistical thesaurus from NPR podcasts” discusses an example application of NNMF and has links to documents explaining the theory behind the NNMF utilization.

The outlier detection package is described in a previous blog post “Outlier detection in a list of numbers”.


This command factorizes the image matrix into the product W H :

{W, H} = GDCLS[noisyTestImagesMat, 20, "MaxSteps" -> 200];
{W, H} = LeftNormalizeMatrixProduct[W, H];


(* {100, 20} *)
(* {20, 784} *)

The rows of H are interpreted as new basis vectors and the rows of W are the coordinates of the images in that new basis. Some appropriate normalization was also done for that interpretation. Note that we are using the non-normalized image matrix.

Let us see the norms of $H$ and mark the top outliers:

norms = Norm /@ H;
ListPlot[norms, PlotRange -> All, PlotLabel -> "Norms of H rows", 
  PlotTheme -> "Detailed"] // 
OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters]
OutlierPosition[norms, TopOutliers@*SPLUSQuartileIdentifierParameters]

enter image description here

Here is the interpretation of the new basis vectors (the outliers are marked in red):

MapIndexed[{#2[[1]], Norm[#], Image[Partition[#, dims[[1]]]]} &, H] /. (# -> Style[#, Red] & /@ 
   OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters])

enter image description here

Using only the outliers of $H$ let us reconstruct the image matrix and the de-noised images:

pos = {1, 6, 10}
dHN = Total[norms]/Total[norms[[pos]]]*
    ReplacePart[ConstantArray[0, Length[norms]], 
     Map[List, pos] -> 1]];
newMatNNMF = W.dHN.H;
denoisedImagesNNMF = 
  Map[Image[Partition[#, dims[[2]]]] &, newMatNNMF];

Often we cannot just rely on outlier detection and have to hand pick the basis for reconstruction. (Especially when we have more than one classes of signals.)


At this point we can plot all images together for comparison:

imgRows = 
  Transpose[{testImages, noisyTestImages6, 
    ImageAdjust@*ColorNegate /@ denoisedImages, 
    ImageAdjust@*ColorNegate /@ denoisedImagesNNMF}];
With[{ncol = 5}, 
 Grid[Prepend[Partition[imgRows, ncol], 
   Style[#, Blue, FontFamily -> "Times"] & /@ 
    Table[{"original", "noised", "SVD", "NNMF"}, ncol]]]]

enter image description here

We can see that NNMF produces cleaner images. This can be also observed/confirmed using threshold binarization — the NNMF images are much cleaner.

imgRows =
  With[{th = 0.5},
   MapThread[{#1, #2, Binarize[#3, th], 
      Binarize[#4, th]} &, {testImageLabels, noisyTestImages6, 
     ImageAdjust@*ColorNegate /@ denoisedImages, 
     ImageAdjust@*ColorNegate /@ denoisedImagesNNMF}]];
With[{ncol = 5}, 
 Grid[Prepend[Partition[imgRows, ncol], 
   Style[#, Blue, FontFamily -> "Times"] & /@ 
    Table[{"label", "noised", "SVD", "NNMF"}, ncol]]]]

enter image description here

Usually with NNMF in order to get good results we have to do more that one iteration of the whole factorization and reconstruction process. And of course NNMF is much slower. Nevertheless, we can see clear advantages of NNMF’s interpretability and leveraging it.

Gallery with other experiments

In those experiments I had to hand pick the NNMF basis used for the reconstruction. Using outlier detection without supervision would not produce good results most of the time.

Further comparison with Classify

We can further compare the de-noising results by building signal (digit) classifiers and running them over the de-noised images.

For such a classifier we have to decide:

  1. do we train only with images of the two signal classes or over a larger set of signals;
  2. how many signals we train with;
  3. with what method the classifiers are built.

Below I use the default method of Classify with all digit images in MNIST that are not in the noised images set. The classifier is run with the de-noising traced these 0-4 images:

Get the images:

{trainImages, trainImageLabels} = Transpose[List @@@ Cases[MNISTdigits, HoldPattern[(im_ -> _)]]];
pred = Map[! MemberQ[testImages, #] &, trainImages];
trainImages = Pick[trainImages, pred];
trainImageLabels = Pick[trainImageLabels, pred];
(* {{0, 934}, {1, 1135}, {2, 1032}, {3, 1010}, {4, 928}, {5, 892}, {6, 958}, {7, 1028}, {8, 974}, {9, 1009}} *)

Build the classifier:

digitCF = Classify[trainImages -> trainImageLabels]

Compute classifier measurements for the original, the PCA de-noised, and NNMF de-noised images:

origCM = ClassifierMeasurements[digitCF, Thread[(testImages) -> testImageLabels]]
pcaCM = ClassifierMeasurements[digitCF, 
      Thread[(Binarize[#, 0.55] &@*ImageAdjust@*ColorNegate /@ denoisedImages) -> testImageLabels]]
nnmfCM = ClassifierMeasurements[digitCF, 
      Thread[(Binarize[#, 0.55] &@*ImageAdjust@*ColorNegate /@  denoisedImagesNNMF) -> testImageLabels]]

Tabulate the classification measurements results:

Grid[{{"Original", "PCA", "NNMF"},
{Row[{"Accuracy:", origCM["Accuracy"]}], Row[{"Accuracy:", pcaCM["Accuracy"]}], Row[{"Accuracy:", nnmfCM["Accuracy"]}]},
{origCM["ConfusionMatrixPlot"], pcaCM["ConfusionMatrixPlot"], nnmfCM["ConfusionMatrixPlot"]}}, 
Dividers -> {None, {True, True, False}}]


Statistical thesaurus from NPR podcasts

Five months ago I worked with transcripts of National Public Radio (NPR) podcasts. The transcripts are available at — see for example “From child actor to artist…“.

Using nearly 5000 transcripts I experimented with topic extraction and statistical thesaurus derivation. The topics are too bulky to show here, but I am going to show some of the statistical thesaurus entries.

I used dimension reduction with Non-Negative Matrix Factorization (NNMF). For more detailed explanations, code for computations, and experimental results see this paper “Topic and thesaurus extraction from a document collection” provided by the MathematicaForPrediction project at GitHub. (The code for NNMF is also provided by the MathematicaForPrediction project at GitHub.)

First let me describe the data. The collection has 5123 transcripts.

Here is a sample of the transcripts (only the first 400 characters of each are taken):
NPR podcast sample 400 characters per podcast

Here is the distribution of the string lengths of the transcripts:
5123 NPR podcasts string length

I removed custom selected stop words from the transcripts. I also stemmed the words using the stemmer called snowball, see The stemmed words are called “terms” below.

Here are descriptive statistics and the distribution of the number of transcripts per term:
Transcripts per term

Here are descriptive statistics and the distribution of the number of terms per transcript:
Terms per transcript

I did not compute the whole statistical thesaurus. Instead I made a function that computes the thesaurus entry of a given word using the right NNMF factor with proper normalization.

Here are sample results of the thesaurus entry “retrieval” (note that the right column contains word stems):
Statistical thesaurus entries