# Adaptive numerical Lebesgue integration by set measure estimates

## Introduction

In this document are given outlines and examples of several related implementations of Lebesgue integration, [1], within the framework of NIntegrate, [7]. The focus is on the implementations of Lebesgue integration algorithms that have multiple options and can be easily extended (in order to do further research, optimization, etc.) In terms of NIntegrate‘s framework terminology it is shown how to implement an integration strategy or integration rule based on the theory of the Lebesgue integral. The full implementation of those strategy and rules — LebesgueIntegration, LebesgueIntegrationRule, and GridLebesgueIntegrationRule — are given in the Mathematica package [5].

The advantage of using NIntegrate‘s framework is that a host of supporting algorithms can be employed for preprocessing, execution, experimentation, and testing (correctness, comparison, and profiling.)

Here is a brief description of the integration strategy LebesgueIntegration in [5]:

1. prepare a function that calculates measure estimates based on random points or low discrepancy sequences of points in the integration domain;

2. use NIntegrate for the computation of one dimensional integrals for that measure estimate function over the range of the integrand function values.

The strategy is adaptive because of the second step — NIntegrate uses adaptive integration algorithms.

Instead of using an integration strategy we can "tuck in" the whole Lebesgue integration process into an integration rule, and then use that integration rule with the adaptive integration algorithms NIntegrate already has. This is done with the implementations of the integration rules LebesgueIntegrationRule and GridLebesgueIntegrationRule.

## Brief theory

Lebesgue integration extends the definition of integral to a much larger class of functions than the class of Riemann integrable functions. The Riemann integral is constructed by partitioning the integrand’s domain (on the axis). The Lebesgue integral is constructed by partitioning the integrand’s co-domain (on the axis). For each value in the co-domain, find the measure of the corresponding set of points in the domain. Roughly speaking, the Lebesgue integral is then the sum of all the products ; see [1]. For our implementation purposes is defined differently, and in the rest of this section we follow [3].

Consider the non-negative bound-able measurable function :

We denote by the measure for the points in for which , i.e.

The Lebesgue integral of over can be be defined as:

Further, we can write the last formula as

The restriction can be handled by defining the following functions and :

and using the formula

Since finding analytical expressions of is hard we are going to look into ways of approximating .

For more details see [1,2,3,4].

(Note, that the theoretical outline the algorithms considered can be seen as algorithms that reduce multidimensional integration to one dimensional integration.)

## Algorithm walk through

We can see that because of Equation (4) we mostly have to focus on estimating the measure function . This section provides a walk through with visual examples of a couple of stochastic ways to do that.

Consider the integral

In order to estimate in for we are going to generate in a set of low discrepancy sequence of points, [2]. Here this is done with points of the so called "Sobol" sequence:

n = 100;
SeedRandom[0,Method -> {"MKL",Method -> {"Sobol", "Dimension" -> 2}}];
points = RandomReal[{0, 1}, {n, 2}];
ListPlot[points, AspectRatio -> Automatic,
PlotTheme -> "Detailed",
FrameLabel -> {"\!$$\*SubscriptBox[\(x$$, $$1$$]\)","\!$$\*SubscriptBox[\(x$$, $$2$$]\)"}]

To each point let us assign a corresponding "volume" that can be used to approximate with Equation (2). We can of course easily assign such volumes to be , but as it can be seen on the plot this would be a good approximation for a larger number of points. Here is an example of a different volume assignment using a Voronoi diagram, [10]:

vmesh = VoronoiMesh[points, {{0, 1}, {0, 1}}, Frame -> True];
Show[{vmesh, Graphics[{Red, Point[points]}]},
FrameLabel -> {"\!$$\*SubscriptBox[\(x$$, $$1$$]\)", "\!$$\*SubscriptBox[\(x$$, $$2$$]\)"}]

(The Voronoi diagram for finds for each point the set of domain points closest to than any other point of .)

Here is a breakdown of the Voronoi diagram volumes corresponding to the generated points (compare with ) :

volumes =PropertyValue[{vmesh, Dimensions[points][[2]]}, MeshCellMeasure];
Histogram[volumes, PlotRange -> All, PlotTheme -> "Detailed",
FrameLabel -> {"volume", "count"}]

Let us define a function that computes according to Equation (1) with the generated points and assigned volumes:

EstimateMeasure[fval_?NumericQ, pointVals_, pointVolumes_] :=
Block[{pinds},
pinds = Clip[Sign[pointVals - fval], {0, 1}, {0, 1}];
pointVolumes.pinds
];

Here is an example call of that function using the Voronoi diagram volumes:

EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points, volumes]
(* 0.845833 *)

And here is another call using uniform volumes:

EstimateMeasure[1.6, Sqrt[2 + Total[#]] & /@ points,
Table[1/Length[points], {Length[points]}]] // N
(* 0.85 *)

The results can be verified using ImplicitRegion:

RegionMeasure[ImplicitRegion[ Sqrt[2 + x1 + x2] >= 1.6, {{x1, 0, 1}, {x2, 0, 1}}]]
(* 0.8432 *)

Or using Integrate:

Integrate[Piecewise[{{1, Sqrt[2 + x1 + x2] >= 1.6}}, 0], {x1, 0, 1}, {x2, 0, 1}]
(* 0.8432 *)

At this point we are ready to compute the integral estimate using Formula (4) :

fvals = Sqrt[2 + Total[#]] & /@ points;
Min[fvals]*EstimateMeasure[0, fvals, volumes] +
NIntegrate[EstimateMeasure[y, fvals, volumes], {y, Min[fvals], Max[fvals]}]
(* 1.72724 *)

To simplify the code we use the symbol to hold the values . Note that instead of the true min and max values of we use estimates of them with .

Here is the verification of the result:

Integrate[Sqrt[2 + x1 + x2], {x1, 0, 1}, {x2, 0, 1}]
% // N
(* 8/15 (16 + 2 Sqrt[2] - 9 Sqrt[3]) *)
(* 1.72798 *)

In order to implement the outlined algorithm so it will be more universal we have to consider volumes rescaling, function positivity, and Voronoi diagram implementation(s). For details how these considerations are resolved see the code of the strategy LebesgueIntegration in [5].

## Further algorithm elaborations

### Measure estimate with regular grid cells

The article 3 and book 4 suggest the measure estimation to be done through membership of regular grid of cells. For example, the points generated in the previous section can be grouped by a grid:

The following steps describe in detail an algorithm based on the proposed in [3,4] measure estimation method. The algorithm is implemented in [5] for the symbol, GridLebesgueIntegrationRule.

1. Generate points filling the where is the dimension of .

2. Partition with a regular grid according to specifications.

• 2.1. Assume the cells are indexed with the integers , .
• 2.2. Assume that all cells have the same volume below.

3. For each point find to which cell of the regular grid it belongs to.

4. For each cell have a list of indices corresponding to the points that belong to it.

5. For a given sub-region of integration rescale to the points to lie within ; denote those points with .

• 5.1. Compute the rescaling factor for the integration rule; denote with .

6. For a given integrand function evaluate over all points .

7. For each cell find the min and max values of .

• 7.1. Denote with and correspondingly.

8. For a given value , where is some integer enumerating the 1D integration rule sampling points, find the coefficients , using the following formula:

9. Find the measure estimate of with

### Axis splitting in Lebesgue integration rules

The implementations of Lebesgue integration rules are required to provide a splitting axis for use of the adaptive algorithms. Of course we can assign a random splitting axis, but that might lead often to slower computations. One way to provide splitting axis is to choose the axis that minimizes the sum of the variances of sub-divided regions estimated by samples of the rule points. This is the same approach taken in NIntegrates rule "MonteCarloRule"; for theoretical details see the chapter "7.8 Adaptive and Recursive Monte Carlo Methods" of [11].

In [5] this splitting axis choice is implemented based on integrand function values. Another approach, more in the spirit of the Lebesgue integration, is to select the splitting axis based on variances of the measure function estimates.

Consider the function:

DensityPlot[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
PlotRange -> All, ColorFunction -> "TemperatureMap"]

Here is an example of sampling points traces using "minimum variance" axis splitting in LebesgueIntegrationRule:

res = Reap@
NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
Method -> {LebesgueIntegrationRule, "Points" -> 600,
"PointGenerator" -> "Sobol",
"AxisSelector" -> "MinVariance"},
"SingularityHandler" -> None},
EvaluationMonitor :> Sow[{x, y}],
PrecisionGoal -> 2.5, MaxRecursion -> 3];
res[[1]]
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.890916 *)

And here are the sampling points with random selection of a splitting axis:

res = Reap@
NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3},
Method -> {LebesgueIntegrationRule, "Points" -> 600,
"PointGenerator" -> "Sobol",
"AxisSelector" -> Random},
"SingularityHandler" -> None},
EvaluationMonitor :> Sow[{x, y}],
PrecisionGoal -> 2.5, MaxRecursion -> 3];
res[[1]]
ListPlot[res[[2, 1]], AspectRatio -> Automatic]
(* 0.892499 *)

Here is a more precise estimate of that integral:

NIntegrate[Exp[-3 (x - 1)^2 - 4 (y - 1)^2], {x, 0, 3}, {y, 0, 3}]
(* 0.898306 *)

## Implementation design within NIntegrate’s framework

### Basic usages

The strategy and rule implementations in [5] can be used in the following ways.

NIntegrate[Sqrt[x], {x, 0, 2}, Method -> LebesgueIntegration]
(* 1.88589 *)

NIntegrate[Sqrt[x], {x, 0, 2},
Method -> {LebesgueIntegration, Method -> "LocalAdaptive",
"Points" -> 2000, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 1.88597 *)

NIntegrate[Sqrt[x], {x, 0, 2},
Method -> {LebesgueIntegrationRule, "Points" -> 2000,
"PointGenerator" -> "Sobol",
"PointwiseMeasure" -> "VoronoiMesh"}, PrecisionGoal -> 3]
(* 1.88597 *)

NIntegrate[Sqrt[x + y + x], {x, 0, 2}, {y, 0, 3}, {z, 0, 4},
Method -> {GridLebesgueIntegrationRule,
Method -> "GaussKronrodRule", "Points" -> 2000,
"GridSizes" -> 5, "PointGenerator" -> "Sobol"}, PrecisionGoal -> 3]
(* 43.6364 *)

### Options

Here are the options for the implemented strategy and rules in [5]:

Grid[Transpose[{#, ColumnForm[Options[#]]} & /@
{LebesgueIntegration,LebesgueIntegrationRule,
GridLebesgueIntegrationRule}],
Alignment -> Left, Dividers -> All]

### Using variable ranges

Integration with variable ranges works "out of the box."

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x}]
(* 2.75817 *)

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
Method -> LebesgueIntegration, PrecisionGoal -> 2]
(* 2.75709 *)

NIntegrate[Sqrt[x + y], {x, 0, 2}, {y, 0, x},
Method -> LebesgueIntegrationRule, PrecisionGoal -> 2]
(* 2.75663 *)

### Infinite ranges

In order to get correct results with infinite ranges the wrapper

Method->{"UnitCubeRescaling","FunctionalRangesOnly"->False, _}

has to be used. Here is an example:

NIntegrate[1/(x^2 + 12), {x, 0, Infinity}]
(* 0.45345 *)

NIntegrate[1/(x^2 + 12), {x, 0, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 2000}}, PrecisionGoal -> 3]
(* 0.453366 *)

For some integrands we have to specify inter-range points or larger MinRecursion.

NIntegrate[1/(x^2), {x, 1, Infinity}]
(* 1. *)

NIntegrate[1/(x^2), {x, 1, 12, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0.999466 *)

NIntegrate[1/(x^2), {x, 1, Infinity},
Method -> {"UnitCubeRescaling", "FunctionalRangesOnly" -> False,
Method -> {LebesgueIntegrationRule, "Points" -> 1000}}]
(* 0. *)

### Evaluation monitoring

With the option "EvaluationMonitor" we can see the sampling points for the strategy and the rules.

This is straightforward for the rules:

res = Reap@
NIntegrate[Exp[-3 (x - 1)^2], {x, 0, 3},
Method -> LebesgueIntegrationRule,
EvaluationMonitor :> Sow[x],
PrecisionGoal -> 3];
ListPlot[res[[2, 1]]]

The strategy LebesgueIntegration uses an internal variable for the calculation of the Lebesgue integral. In "EvaluationMonitor" either that variable has to be used, or a symbol name has to be passed though the option "LebesgueIntegralVariableSymbol". Here is an example:

res = Reap@
NIntegrate[Sqrt[x + y + z], {x, -1, 2}, {y, 0, 1}, {z, 1, 12},
Method -> {LebesgueIntegration, "Points" -> 600,
"PointGenerator" -> "Sobol",
"LebesgueIntegralVariableSymbol" -> fval},
EvaluationMonitor :> {Sow[fval]},
PrecisionGoal -> 3];
res = DeleteCases[res, fval, \[Infinity]];
ListPlot[res[[2, 1]]]

### Profiling

We can use NIntegrate‘s utility functions for visualization and profiling in order to do comparison of the implemented algorithms with related ones (like "AdaptiveMonteCarlo") which NIntegrate has (or are plugged-in).

Needs["IntegrationNIntegrateUtilities"]

Common function and domain:

fexpr = 1/(375/100 - Cos[x] - Cos[y]);
ranges = {{x, 0, \[Pi]}, {y, 0, \[Pi]}};

Common parameters:

pgen = "Random";
npoints = 1000;

NIntegrateProfile[
NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
Method -> {"MonteCarloRule", "PointGenerator" -> pgen,
"Points" -> npoints,
"AxisSelector" -> "MinVariance"}},
PrecisionGoal -> 3]
]
(* {"IntegralEstimate" -> InputForm[2.8527356472097902], "Evaluations" -> 17000, "Timing" -> 0.0192079} *)

LebesgueIntegrationRule call:

NIntegrateProfile[
NIntegrate[fexpr, Evaluate[Sequence @@ ranges],
Method -> {LebesgueIntegrationRule,
"PointGenerator" -> pgen, "Points" -> npoints,
"AxisSelector" -> "MinVariance",
Method -> "ClenshawCurtisRule"},
"SingularityHandler" -> None}, PrecisionGoal -> 3]
]
(* {"IntegralEstimate" -> InputForm[2.836659588960318], "Evaluations" -> 13000, "Timing" -> 0.384246} *)

### Design diagrams

The flow charts below show that the plug-in designs have common elements. In order to make the computations more effective the rule initialization prepares the data that is used in all rule invocations. For the strategy the initialization can be much lighter since the strategy algorithm is executed only once.

In the flow charts the double line boxes designate sub-routines. We can see that so called Hollywood principle "don’t call us, we’ll call you" in Object-oriented programming is manifested.

The following flow chart shows the steps of NIntegrate‘s execution when the integration strategy LebesgueIntegration is used.

The following flow chart shows the steps of NIntegrate‘s execution when the integration rule LebesgueIntegrationRule is used.

## Algorithms versions and options

There are multiple architectural, coding, and interface decisions to make while doing implementations like the ones in [5] and described in this document. The following mind map provides an overview of alternatives and interactions between components and parameters.

## Alternative implementations

In many ways using the Lebesgue integration rule with the adaptive algorithms is similar to using NIntegrate‘s "AdaptiveMonteCarlo" and its rule "MonteCarloRule". Although it is natural to consider plugging-in the Lebesgue integration rules into "AdaptiveMonteCarlo" at this point NIntegrate‘s framework does not allow "AdaptiveMonteCarlo" the use of a rule different than "MonteCarloRule".

We can consider using Monte Carlo algorithms for estimating the measures corresponding to a vector of values (that come from a 1D integration rule). This can be easily done, but it is not that effective because of the way NIntegrate handles vector integrands and because of stopping criteria issues when the measures are close to .

## Future plans

One of the most interesting extensions of the described Lebesgue integration algorithms and implementation designs is their extension with more advanced features of Mathematica for geometric computation. (Like the functions VoronoiMesh, RegionMeasure, and ImplicitRegion used above.)

Another interesting direction is the derivation and use of symbolic expressions for the measure functions. (Hybrid symbolic and numerical algorithms can be obtained as NIntegrate‘s handling of piecewise functions or the strategy combining symbolic and numerical integration described in [9].)

## References

[1] Wikipedia entry, Lebesgue integration, URL: https://en.wikipedia.org/wiki/Lebesgue_integration .

[2] Wikipedia entry, Low-discrepancy sequence, URL: https://en.wikipedia.org/wiki/Low-discrepancy_sequence .

[3] B. L. Burrows, A new approach to numerical integration, 1. Inst. Math. Applics., 26(1980), 151-173.

[4] T. He, Dimensionality Reducing Expansion of Multivariate Integration, 2001, Birkhauser Boston. ISBN-13:978-1-4612-7414-8 .

[5] A. Antonov, Adaptive Numerical Lebesgue Integration Mathematica Package, 2016, MathematicaForPrediction project at GitHub.

[6] A. Antonov, Lebesgue integration, Wolfram Demonstrations Project, 2007. URL: http://demonstrations.wolfram.com/LebesgueIntegration .

[7] "Advanced Numerical Integration in the Wolfram Language", Wolfram Research Inc. URL: https://reference.wolfram.com/language/tutorial/NIntegrateOverview.html .

[8] A. Antonov, "How to implement custom integration rules for use by NIntegrate?", (2016) Mathematica StackExchange answer, URL: http://mathematica.stackexchange.com/a/118326/34008 .

[9] A. Antonov, "How to implement custom NIntegrate integration strategies?", (2016) Mathematica StackExchange answer, URL: http://mathematica.stackexchange.com/a/118922/34008 .

[10] Wikipedia entry, Voronoi diagram, URL: https://en.wikipedia.org/wiki/Voronoi_diagram .

[11] Press, W.H. et al., Numerical Recipes in C (2nd Ed.): The Art of Scientific Computing, 1992, Cambridge University Press, New York, NY, USA.

# Making Chernoff faces for data visualization

## Introduction

This blog post describes the use of face-like diagrams to visualize multidimensional data introduced by Herman Chernoff in 1973, see [1].

The idea to use human faces in order to understand, evaluate, or easily discern (the records of) multidimensional data is very creative and inspirational. As Chernoff says in [1], the object of the idea is to "represent multivariate data, subject to strong but possibly complex relationships, in such a way that an investigator can quickly comprehend relevant information and then apply appropriate statistical analysis." It is an interesting question how useful this approach is and it seems that there at least several articles discussing that; see for example [2].

I personally find the use of Chernoff faces useful in a small number of cases, but that is probably true for many “creative” data visualization methods.

Below are given both simple and more advanced examples of constructing Chernoff faces for data records using the Mathematica package [3]. The considered data is categorized as:

1. a small number of records, each with small number of elements;
2. a large number of records, each with less elements than Chernoff face parts;
3. a list of long records, each record with much more elements than Chernoff face parts;
4. a list of nearest neighbors or recommendations.

For several of the visualizing scenarios the records of two "real life" data sets are used: Fisher Iris flower dataset [7], and "Vinho Verde" wine quality dataset [8]. For the rest of the scenarios the data is generated.

A fundamental restriction of using Chernoff faces is the necessity to properly transform the data variables into the ranges of the Chernoff face diagram parameters. Therefore, proper data transformation (standadizing and rescaling) is an inherent part of the application of Chernoff faces, and this document describes such data transformation procedures (also using [3]).

The packages [3,4] are used to produce the diagrams in this post. The following two commands load them.

Import["https://raw.githubusercontent.com/antononcube/\
MathematicaForPrediction/master/ChernoffFaces.m"]

Import["https://raw.githubusercontent.com/antononcube/\
MathematicaForPrediction/master/MathematicaForPredictionUtilities.m"]

## Making faces

### Just a face

Here is a face produced by the function ChernoffFace of [3] with its simplest signature:

SeedRandom[152092]
ChernoffFace[]

Here is a list of faces:

SeedRandom[152092]
Table[ChernoffFace[ImageSize -> Tiny], {7}]

### Proper face making

The "proper" way to call ChernoffFace is to use an association for the facial parts placement, size, rotation, and color. The options are passed to Graphics.

SeedRandom[2331];
Keys[ChernoffFace["FacePartsProperties"]] ->
RandomReal[1, Length[ChernoffFace["FacePartsProperties"]]]],
ImageSize -> Small , Background -> GrayLevel[0.85]]

The Chernoff face drawn with the function ChernoffFace can be parameterized to be asymmetric.

The parameters argument mixes (1) face parts placement, sizes, and rotation, with (2) face parts colors and (3) a parameter should it be attempted to make the face symmetric. All facial parts parameters have the range [0,1]

Here is the facial parameter list:

Keys[ChernoffFace["FacePartsProperties"]]

(* {"FaceLength", "ForheadShape", "EyesVerticalPosition", "EyeSize", \
"EyeSlant", "LeftEyebrowSlant", "LeftIris", "NoseLength", \
"MouthSmile", "LeftEyebrowTrim", "LeftEyebrowRaising", "MouthTwist", \
"MouthWidth", "RightEyebrowTrim", "RightEyebrowRaising", \
"RightEyebrowSlant", "RightIris"} *)

The order of the parameters is chosen to favor making symmetric faces when a list of random numbers is given as an argument, and to make it easier to discern the faces when multiple records are visualized. For experiments and discussion about which facial features bring better discern-ability see [2]. One of the conclusions of [2] is that eye size and eye brow slant are most decisive, followed by face size and shape.

Here are the rest of the parameters (colors and symmetricity):

Complement[Keys[ChernoffFace["Properties"]],
Keys[ChernoffFace["FacePartsProperties"]]]

(* {"EyeBallColor", "FaceColor", "IrisColor", "MakeSymmetric", \
"MouthColor", "NoseColor"} *)

### Face coloring

The following code make a row of faces by generating seven sequences of random numbers in $$[0,1]$$, each sequence with length the number of facial parameters. The face color is assigned randomly and the face color or a darker version of it is used as a nose color. If the nose color is the same as the face color the nose is going to be shown "in profile", otherwise as a filled polygon. The colors of the irises are random blend between light brown and light blue. The color of the mouth is randomly selected to be black or red.

SeedRandom[201894];
Block[{pars = Keys[ChernoffFace["FacePartsProperties"]]},
Grid[{#}] &@
Table[ChernoffFace[Join[
<|"FaceColor" -> (rc =
ColorData["BeachColors"][RandomReal[1]]),
"NoseColor" -> RandomChoice[{Identity, Darker}][rc],
"IrisColor" -> Lighter[Blend[{Brown, Blue}, RandomReal[1]]],
"MouthColor" -> RandomChoice[{Black, Red}]|>],
ImageSize -> 100], {7}]
]

### Symmetric faces

The parameter "MakeSymmetric" is by default True. Setting "MakeSymmetric" to true turns an incomplete face specification into a complete specification with the missing paired parameters filled in. In other words, the symmetricity is not enforced on the specified paired parameters, only on the ones for which specifications are missing.

The following faces are made symmetric by removing the facial parts parameters that start with "R" (for "Right") and the parameter "MouthTwist".

SeedRandom[201894];
Block[{pars = Keys[ChernoffFace["FacePartsProperties"]]},
Grid[{#}] &@Table[(
asc =
pars -> RandomReal[1, Length[pars]]],
<|"FaceColor" -> (rc =
ColorData["BeachColors"][RandomReal[1]]),
"NoseColor" -> RandomChoice[{Identity, Darker}][rc],
"IrisColor" ->
Lighter[Blend[{Brown, Blue}, RandomReal[1]]],
"MouthColor" -> RandomChoice[{Black, Red}]|>];
asc =
Pick[asc,
StringMatchQ[Keys[asc],
x : (StartOfString ~~ Except["R"] ~~ __) /;
x != "MouthTwist"]];
ChernoffFace[asc, ImageSize -> 100]), {7}]]

Note that for the irises we have two possibilities of synchronization:

pars = <|"LeftIris" -> 0.8, "IrisColor" -> Green|>;
{ChernoffFace[Join[pars, <|"RightIris" -> pars["LeftIris"]|>],
ImageSize -> 100],
ChernoffFace[
Join[pars, <|"RightIris" -> 1 - pars["LeftIris"]|>],
ImageSize -> 100]}

## Visualizing records (first round)

The conceptually straightforward application of Chernoff faces is to visualize ("give a face" to) each record in a dataset. Because the parameters of the faces have the same ranges for the different records, proper rescaling of the records have to be done first. Of course standardizing the data can be done before rescaling.

First let us generate some random data using different distributions:

SeedRandom[3424]
{dists, data} = Transpose@Table[(
rdist =
RandomChoice[{NormalDistribution[RandomReal[10],
RandomReal[10]], PoissonDistribution[RandomReal[4]],
{rdist, RandomVariate[rdist, 12]}), {10}];
data = Transpose[data];

The data is generated in such a way that each column comes from a certain probability distribution. Hence, each record can be seen as an observation of the variables corresponding to the columns.

This is how the columns of the generated data look like using DistributionChart:

DistributionChart[Transpose[data],
ChartLabels ->
Placed[MapIndexed[
Grid[List /@ {Style[#2[[1]], Bold, Red, Larger]}] &, dists], Above],
ChartElementFunction -> "PointDensity",
ChartStyle -> "SandyTerrain", ChartLegends -> dists,
BarOrigin -> Bottom, GridLines -> Automatic,
ImageSize -> 900]

At this point we can make a face for each record of the rescaled data:

faces = Map[ChernoffFace, Transpose[Rescale /@ Transpose[data]]];

and visualize the obtained faces in a grid.

Row[{Grid[
Partition[#, 4] &@Map[Append[#, ImageSize -> 100] &, faces]],
"   ", Magnify[#, 0.85] &@
GridTableForm[
List /@ Take[Keys[ChernoffFace["FacePartsProperties"]],
Dimensions[data][[2]]],
}]

(The table on the right shows which facial parts are used for which data columns.)

## Some questions to consider

Several questions and observations arise from the example in the previous section.

#### 1. What should we do if the data records have more elements than facial parts parameters of the Chernoff face diagram?

This is another fundamental restriction of Chernoff faces — the number of data columns is limiter by the number of facial features.

One way to resolve this is to select important variables (columns) of the data; another is to represent the records with a vector of statistics. The latter is shown in the section "Chernoff faces for lists of long lists".

#### 2. Are there Chernoff face parts that are easier to perceive or judge than others and provide better discern-ability for large collections of records?

Research of the pre-attentiveness and effectiveness with Chernoff faces, [2], shows that eye size and eyebrow slant are the features that provide best discern-ability. Below this is used to select some of the variable-to-face-part correspondences.

#### 3. How should we deal with outliers?

Since we cannot just remove the outliers from a record — we have to have complete records — we can simply replace the outliers with the minimum or maximum values allowed for the corresponding Chernoff face feature. (All facial features of ChernoffFace have the range $$[0,1]$$.) See the next section for an example.

## Data standardizing and rescaling

Given a full array of records, we most likely have to standardize and rescale the columns in order to use the function ChernoffFace. To help with that the package [3] provides the function VariablesRescale which has the options "StandardizingFunction" and "RescaleRangeFunction".

Consider the following example of VariableRescale invocation in which: 1. each column is centered around its median and then divided by the inter-quartile half-distance (quartile deviation), 2. followed by clipping of the outliers that are outside of the disk with radius 3 times the quartile deviation, and 3. rescaling to the unit interval.

rdata = VariablesRescale[N@data,
"StandardizingFunction" -> (Standardize[#, Median, QuartileDeviation] &),
"RescaleRangeFunction" -> ({-3, 3} QuartileDeviation[#] &)];
TableForm[rdata /. {0 -> Style[0, Bold, Red], 1 -> Style[1, Bold, Red]}]

Remark: The bottom outliers are replaced with 0 and the top outliers with 1 using Clip.

## Chernoff faces for a small number of short records

In this section we are going use the Fisher Iris flower data set [7]. By "small number of records" we mean few hundred or less.

### Getting the data

These commands get the Fisher Iris flower data set shipped with Mathematica:

irisDataSet =
Map[Flatten,
List @@@ ExampleData[{"MachineLearning", "FisherIris"}, "Data"]];
irisColumnNames =
Most@Flatten[
List @@ ExampleData[{"MachineLearning", "FisherIris"},
"VariableDescriptions"]];
Dimensions[irisDataSet]

(* {150, 5} *)

Here is a summary of the data:

Grid[{RecordsSummary[irisDataSet, irisColumnNames]},
Dividers -> All, Alignment -> Top]

### Simple variable dependency analysis

Using the function VariableDependenceGrid of the package [4] we can plot a grid of variable cross-dependencies. We can see from the last row and column that "Petal length" and "Petal width" separate setosa from versicolor and virginica with a pretty large gap.

Magnify[#, 1] &@
VariableDependenceGrid[irisDataSet, irisColumnNames,
"IgnoreCategoricalVariables" -> False]

### Chernoff faces for Iris flower records

Since we want to evaluate the usefulness of Chernoff faces for discerning data records groups or clusters, we are going to do the following steps.

1. Data transformation. This includes standardizing and rescaling and selection of colors.
2. Make a Chernoff face for each record without the label class "Species of iris".
3. Plot shuffled Chernoff faces and attempt to visually cluster them or find patterns.
4. Make a Chernoff face for each record using the label class "Specie of iris" to color the faces. (Records of the same class get faces of the same color.)
5. Compare the plots and conclusions of step 2 and 4.

#### 1. Data transformation

First we standardize and rescale the data:

chernoffData = VariablesRescale[irisDataSet[[All, 1 ;; 4]]];

These are the colors used for the different species of iris:

faceColorRules =
-> Map[Lighter[#, 0.5] &, {Purple, Blue, Green}]]

(* {"setosa" -> RGBColor[0.75, 0.5, 0.75],
"versicolor" -> RGBColor[0.5, 0.5, 1.],
"virginica" -> RGBColor[0.5, 1., 0.5]} *)

Add the colors to the data for the faces:

Append, {chernoffData, irisDataSet[[All, -1]] /. faceColorRules}];

Plot the distributions of the rescaled variables:

DistributionChart[
Transpose@chernoffData[[All, 1 ;; 4]],
GridLines -> Automatic,
ChartElementFunction -> "PointDensity",
ChartStyle -> "SandyTerrain",
ChartLegends -> irisColumnNames, ImageSize -> Large]

#### 2. Black-and-white Chernoff faces

Make a black-and-white Chernoff face for each record without using the species class:

chfacesBW =
ChernoffFace[
"LeftEyebrowSlant"} -> Most[#]],
ImageSize -> 100] & /@ chernoffData;

Since "Petal length" and "Petal width" separate the classes well for those columns we have selected the parameters "EyeSize" and "LeftEyebrowSlant" based on [2].

#### 3. Finding patterns in a collection of faces

Combine the faces into a image collage:

ImageCollage[RandomSample[chfacesBW], Background -> White]

We can see that faces with small eyes tend have middle-lowered eyebrows, and that faces with large eyes tend to have middle raised eyebrows and large noses.

#### 4. Chernoff faces colored by the species

Make a Chernoff face for each record using the colors added to the rescaled data:

chfaces =
ChernoffFace[
"LeftEyebrowSlant", "FaceColor"} -> #],
ImageSize -> 100] & /@ chernoffData;

Make an image collage with the obtained faces:

ImageCollage[chfaces, Background -> White]

#### 5. Comparison

We can see that the collage with colored faces completely explains the patterns found in the black-and-white faces: setosa have smaller petals (both length and width), and virginica have larger petals.

## Browsing a large number of records with Chernoff faces

If we have a large number of records each comprised of a relative small number of numerical values we can use Chernoff faces to browse the data by taking small subsets of records.

Here is an example using "Vinho Verde" wine quality dataset [8].

## Chernoff faces for lists of long lists

In this section we consider data that is a list of lists. Each of the lists (or rows) is fairly long and represents values of the same variable or process. If the data is a full array, then we can say that in this section we deal with transposed versions of the data in the previous sections.

Since each row is a list of many elements visualizing the rows directly with Chernoff faces would mean using a small fraction of the data. A natural approach in those situations is to summarize each row with a set of descriptive statistics and use Chernoff faces for the row summaries.

The process is fairly straightforward; the rest of the section gives concrete code steps of executing it.

### Data generation

Here we create 12 rows of 200 elements by selecting a probability distribution for each row.

SeedRandom[1425]
{dists, data} = Transpose@Table[(
rdist =
RandomChoice[{NormalDistribution[RandomReal[10],
RandomReal[10]], PoissonDistribution[RandomReal[4]],
{rdist, RandomVariate[rdist, 200]}), {12}];
Dimensions[data]

(* {12, 200} *)

We have the following 12 "records" each with 200 "fields":

DistributionChart[data,
ChartLabels ->
MapIndexed[
Row[{Style[#2[[1]], Red, Larger],
"  ", Style[#1, Larger]}] &, dists],
ChartElementFunction ->
ChartElementData["PointDensity",
"ColorScheme" -> "SouthwestColors"], BarOrigin -> Left,
GridLines -> Automatic, ImageSize -> 1000]

Here is the summary of the records:

Grid[ArrayReshape[RecordsSummary[Transpose@N@data], {3, 4}],
Dividers -> All, Alignment -> {Left}]

### Data transformation

Here we "transform" each row into a vector of descriptive statistics:

statFuncs = {Mean, StandardDeviation, Kurtosis, Median,
QuartileDeviation, PearsonChiSquareTest};
sdata = Map[Through[statFuncs[#]] &, data];
Dimensions[sdata]

(* {12, 6} *)

Next we rescale the descriptive statistics data:

sdata = VariablesRescale[sdata,
"StandardizingFunction" -> (Standardize[#, Median,
QuartileDeviation] &)];

For kurtosis we have to do special rescaling if we want to utilize the property that Gaussian processes have kurtosis 3:

sdata[[All, 3]] =
Rescale[#, {3, Max[#]}, {0.5, 1}] &@Map[Kurtosis, N[data]];

Here is the summary of the columns of the rescaled descriptive statistics array:

Grid[{RecordsSummary[sdata, ToString /@ statFuncs]},
Dividers -> All]

### Visualization

First we define a function that computes and tabulates (descriptive) statistics over a record.

Clear[TipTable]
TipTable[vec_, statFuncs_, faceParts_] :=
Block[{},
GridTableForm[
Transpose@{faceParts, statFuncs,
NumberForm[Chop[#], 2] & /@ Through[statFuncs[vec]]},
TableHeadings -> {"FacePart", "Statistic", "Value"}]] /;
Length[statFuncs] == Length[faceParts];

To visualize the descriptive statistics of the records using Chernoff faces we have to select appropriate facial features.

faceParts = {"NoseLength", "EyeSize", "EyeSlant",
"EyesVerticalPosition", "FaceLength", "MouthSmile"};
TipTable[First@sdata, statFuncs, faceParts]

One possible visualization of all records is with the following commands. Note the addition of the parameter "FaceColor" to also represent how close a standardized row is to a sample from Normal Distribution.

{odFaceColor, ndFaceColor} = {White,  ColorData[7, "ColorList"][[8]]};
Grid[ArrayReshape[Flatten@#, {4, 3}, ""], Dividers -> All,
Alignment -> {Left, Top}] &@
chFace =
ChernoffFace[
Join[asc, <|
"FaceColor" -> Blend[{odFaceColor, ndFaceColor}, #2[[-1]]],
"IrisColor" -> GrayLevel[0.8],
"NoseColor" -> ndFaceColor|>], ImageSize -> 120,
AspectRatio -> Automatic];
tt = TipTable[N@#3, Join[statFuncs, {Last@statFuncs}],
Join[faceParts, {"FaceColor"}]];
Column[{Style[#1, Red],
Grid[{{Magnify[#4, 0.8],
Tooltip[chFace, tt]}, {Magnify[tt, 0.7], SpanFromAbove}},
Alignment -> {Left, Top}]}]) &
, {Range[Length[sdata]], sdata, data, dists}]

## Visualizing similarity with nearest neighbors or recommendations

### General idea

Assume the following scenario: 1. we have a set of items (movies, flowers, etc.), 2. we have picked one item, 3. we have computed the Nearest Neighbors (NNs) of that item, and 4. we want to visualize how much of a good fit the NNs are to the picked item.

Conceptually we can translate the phrase "how good the found NNs (or recommendations) are" to:

• "how similar the NNs are to the selected item", or

• "how different the NNs are to the selected item."

If we consider the picked item as the prototype of the most normal or central item then we can use Chernoff faces to visualize item’s NNs deviations.

Remark: Note that Chernoff faces provide similarity visualization most linked to Euclidean distance that to other distances.

### Concrete example

The code in this section demonstrates how to visualize nearest neighbors by Chernoff faces variations.

First we create a nearest neighbors finding function over the Fisher Iris data set (without the species class label):

irisNNFunc =
Nearest[irisDataSet[[All, 1 ;; -2]] -> Automatic,
DistanceFunction -> EuclideanDistance]

Here are nearest neighbors of some random row from the data.

itemInd = 67;
nnInds = irisNNFunc[irisDataSet[[itemInd, 1 ;; -2]], 20];

We can visualize the distances with of the obtained NNs with the prototype:

ListPlot[Map[
EuclideanDistance[#, irisDataSet[[itemInd, 1 ;; -2]]] &,
irisDataSet[[nnInds, 1 ;; -2]]]]

Next we subtract the prototype row from the NNs data rows, we standardize, and we rescale the interval $$[ 0, 3 \sigma ]$$ to $$[ 0.5, 1 ]$$:

snns = Transpose@Map[
Clip[Rescale[
Standardize[#, 0 &, StandardDeviation], {0, 3}, {0.5, 1}], {0,
1}] &,
Transpose@
Map[# - irisDataSet[[itemInd, 1 ;; -2]] &,
irisDataSet[[nnInds, 1 ;; -2]]]];

Here is how the original NNs data row look like:

GridTableForm[

And here is how the rescaled NNs data rows look like:

GridTableForm[Take[snns, 12],

Next we make Chernoff faces for the rescaled rows and present them in a easier to grasp way.

We use the face parts:

Take[Keys[ChernoffFace["FacePartsProperties"]], 4]

(* {"FaceLength", "ForheadShape", "EyesVerticalPosition", "EyeSize"} *)

To make the face comparison easier, the first face is the one of the prototype, each Chernoff face is drawn within the same rectangular frame, and the NNs indices are added on top of the faces.

chfaces =
ChernoffFace[#, Frame -> True,
PlotRange -> {{-1, 1}, {-2, 1.5}}, FrameTicks -> False,
ImageSize -> 100] & /@ snns;
chfaces =
ReplacePart[#1,
1 ->
Append[#1[[1]],
Text[Style[#2, Bold, Red], {0, 1.4}]]] &, {chfaces, nnInds}];
ImageCollage[chfaces, Background -> GrayLevel[0.95]]

We can see that the first few – i.e. closest — NNs have fairly normal looking faces.

Note that using a large number of NNs would change the rescaled values and in that way the first NNs would appear more similar.

## References

[1] Herman Chernoff (1973). "The Use of Faces to Represent Points in K-Dimensional Space Graphically" (PDF). Journal of the American Statistical Association (American Statistical Association) 68 (342): 361-368. doi:10.2307/2284077. JSTOR 2284077. URL: http://lya.fciencias.unam.mx/rfuentes/faces-chernoff.pdf .

[2] Christopher J. Morris; David S. Ebert; Penny L. Rheingans, "Experimental analysis of the effectiveness of features in Chernoff faces", Proc. SPIE 3905, 28th AIPR Workshop: 3D Visualization for Data Exploration and Decision Making, (5 May 2000); doi: 10.1117/12.384865. URL: http://www.research.ibm.com/people/c/cjmorris/publications/Chernoff_990402.pdf .

[3] Anton Antonov, Chernoff Faces implementation in Mathematica, (2016), source code at MathematicaForPrediction at GitHub, package ChernofFacess.m .

[4] Anton Antonov, MathematicaForPrediction utilities, (2014), source code MathematicaForPrediction at GitHub, package MathematicaForPredictionUtilities.m.

[5] Anton Antonov, Variable importance determination by classifiers implementation in Mathematica,(2015), source code at MathematicaForPrediction at GitHub, package VariableImportanceByClassifiers.m.

[6] Anton Antonov, "Importance of variables investigation guide", (2016), MathematicaForPrediction at GitHub, https://github.com/antononcube/MathematicaForPrediction, folder Documentation.

[7] Wikipedia entry, Iris flower data set, https://en.wikipedia.org/wiki/Iris_flower_data _set .

[8] P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. URL https://archive.ics.uci.edu/ml/datasets/Wine+Quality .

# Comparison of PCA, NNMF, and ICA over image de-noising

## Introduction

In a previous blog post, [1], I compared Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a collection of noised images of digit handwriting from the MNIST data set, [3], which is available in Mathematica.

This blog post adds to that comparison the use of Independent Component Analysis (ICA) proclaimed in my previous blog post, [1].

## Computations

The ICA related additional computations to those in [1] follow.

First we load the package IndependentComponentAnalysis.m :

Import["https://raw.githubusercontent.com/antononcube/\
MathematicaForPrediction/master/IndependentComponentAnalysis.m"]

From that package we can use the function IndependentComponentAnalysis to find components:

{S, A} = IndependentComponentAnalysis[Transpose[noisyTestImagesMat], 9, PrecisionGoal -> 4.5];
Norm[noisyTestImagesMat - Transpose[S.A]]/Norm[noisyTestImagesMat]
(* 0.592739 *)

Let us visualize the norms of the mixing matrix A :

norms = Norm /@ A;
ListPlot[norms, PlotRange -> All, PlotLabel -> "Norms of A rows",
PlotTheme -> "Detailed"] //
ColorPlotOutliers[TopOutliers@*HampelIdentifierParameters]
pos = OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters]

Next we can visualize the found "source" images:

ncol = 2;
Grid[Partition[Join[
MapIndexed[{#2[[1]], Norm[#],
Transpose[S]] /. (# -> Style[#, Red] & /@ pos),
Table["", {ncol - QuotientRemainder[Dimensions[S][[2]], ncol][[2]]}]
], ncol], Dividers -> All]

After selecting several of source images we zero the rest by modifying the matrix A:

pos = {6, 7, 9};
norms = Norm /@ A;
dA = DiagonalMatrix[
ReplacePart[ConstantArray[0, Length[norms]], Map[List, pos] -> 1]];
newMatICA =
Transpose[Map[# + Mean[Transpose[noisyTestImagesMat]] &, S.dA.A]];
denoisedImagesICA = Map[Image[Partition[#, dims[[2]]]] &, newMatICA];

## Visual comparison of de-noised images

Next we visualize the ICA de-noised images together with the original images and the SVD and NNMF de-noised ones.

There are several ways to present that combination of images.

## Comparison using classifiers

We can further compare the de-noising results by building digit classifiers and running them over the de-noised images.

icaCM = ClassifierMeasurements[digitCF,
denoisedImagesICA) -> testImageLabels]]

We can see that with ICA we get better results than with PCA/SVD, probably not as good as NNMF, but very close.

## References

[1] A. Antonov, "Comparison of PCA and NNMF over image de-noising", (2016), MathematicaForPrediction at WordPress.

[2] A. Antonov, "Independent Component Analysis for multidimensional signals", (2016), MathematicaForPrediction at WordPress.

[3] Wikipedia entry, MNIST database.

# Independent component analysis for multidimensional signals

## Introduction

Independent Component Analysis (ICA) is a (matrix factorization) method for separation of a multi-dimensional signal (represented with a matrix) into a weighted sum of sub-components that have less entropy than the original variables of the signal. See [1,2] for introduction to ICA and more details.

This blog post is to proclaim the implementation of the "FastICA" algorithm in the package IndependentComponentAnalysis.m and show a basic application with it. (I programmed that package last weekend. It has been in my ToDo list to start ICA algorithms implementations for several months… An interesting offshoot was the procedure I derived for the StackExchange question "Extracting signal from Gaussian noise".)

In this blog post ICA is going to be demonstrated with both generated data and "real life" weather data (temperatures of three cities within one month).

## Generated data

In order to demonstrate ICA let us make up some data in the spirit of the "cocktail party problem".

(*Signal functions*)
Clear[s1, s2, s3]
s1[t_] := Sin[600 \[Pi] t/10000 + 6*Cos[120 \[Pi] t/10000]] + 1.2
s2[t_] := Sin[\[Pi] t/10] + 1.2
s3[t_?NumericQ] := (((QuotientRemainder[t, 23][[2]] - 11)/9)^5 + 2.8)/2 + 0.2

(*Mixing matrix*)
A = {{0.44, 0.2, 0.31}, {0.45, 0.8, 0.23}, {0.12, 0.32, 0.71}};

(*Signals matrix*)
nSize = 600;
S = Table[{s1[t], s2[t], s3[t]}, {t, 0, nSize, 0.5}];

(*Mixed signals matrix*)
M = A.Transpose[S];

(*Signals*)
Grid[{Map[
Plot[#, {t, 0, nSize}, PerformanceGoal -> "Quality",
ImageSize -> 250] &, {s1[t], s2[t], s3[t]}]}]

(*Mixed signals*)
Grid[{Map[ListLinePlot[#, ImageSize -> 250] &, M]}]

I took the data generation formulas from [6].

## ICA application

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/IndependentComponentAnalysis.m"]

It is important to note that the usual ICA model interpretation for the factorized matrix X is that each column is a variable (audio signal) and each row is an observation (recordings of the microphones at a given time). The matrix 3×1201 M was constructed with the interpretation that each row is a signal, hence we have to transpose M in order to apply the ICA algorithms, X=M^T.

X = Transpose[M];

{S, A} = IndependentComponentAnalysis[X, 3];

Check the approximation of the obtained factorization:

Norm[X - S.A]
(* 3.10715*10^-14 *)

Plot the found source signals:

Grid[{Map[ListLinePlot[#, PlotRange -> All, ImageSize -> 250] &,
Transpose[S]]}]

Because of the random initialization of the inverting matrix in the algorithm the result my vary. Here is the plot from another run:

The package also provides the function FastICA that returns an association with elements that correspond to the result of the function fastICA provided by the R package "fastICA". See [4].

Here is an example usage:

res = FastICA[X, 3];

Keys[res]
(* {"X", "K", "W", "A", "S"} *)

Grid[{Map[
ListLinePlot[#, PlotRange -> All, ImageSize -> Medium] &,
Transpose[res["S"]]]}]

Note that (in adherence to [4]) the function FastICA returns the matrices S and A for the centralized matrix X. This means, for example, that in order to check the approximation proper mean has to be supplied:

Norm[X - Map[# + Mean[X] &, res["S"].res["A"]]]
(* 2.56719*10^-14 *)

## Signatures and results

The result of the function IndependentComponentAnalysis is a list of two matrices. The result of FastICA is an association of the matrices obtained by ICA. The function IndependentComponentAnalysis takes a method option and options for precision goal and maximum number of steps:

In[657]:= Options[IndependentComponentAnalysis]

Out[657]= {Method -> "FastICA", MaxSteps -> 200, PrecisionGoal -> 6}

The intent is IndependentComponentAnalysis to be the front interface to different ICA algorithms. (Hence, it has a Method option.) The function FastICA takes as options the named arguments of the R function fastICA described in [4].

In[658]:= Options[FastICA]

Out[658]= {"NonGaussianityFunction" -> Automatic,
"NegEntropyFactor" -> 1, "InitialUnmixingMartix" -> Automatic,
"RowNorm" -> False, MaxSteps -> 200, PrecisionGoal -> 6,
"RFastICAResult" -> True}

At this point FastICA has only the deflation algorithm described in [1]. ([4] has also the so called "symmetric" ICA sub-algorithm.) The R function fastICA in [4] can use only two neg-Entropy functions log(cosh(x)) and exp(-u^2/2). Because of the symbolic capabilities of Mathematica FastICA of [3] can take any listable function through the option "NonGaussianityFunction", and it will find and use the corresponding first and second derivatives.

## Using NNMF for ICA

It seems that in some cases, like the generated data used in this blog post, Non-Negative Matrix Factorization (NNMF) can be applied for doing ICA.

To be clear, NNMF does dimension reduction, but its norm minimization process does not enforce variable independence. (It enforces non-negativity.) There are at least several articles discussing modification of NNMF to do ICA. For example [6].

Load NNMF package [5] (from MathematicaForPrediction at GitHub):

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/NonNegativeMatrixFactorization.m"]

After several applications of NNMF we get signals close to the originals:

{W, H} = GDCLS[M, 3];
Grid[{Map[ListLinePlot[#, ImageSize -> 250] &, Normal[H]]}]

For the generated data in this blog post, FactICA is much faster than NNMF and produces better separation of the signals with every run. The data though is a typical representative for the problems ICA is made for. Another comparison with image de-noising, extending my previous blog post, will be published shortly.

## ICA for mixed time series of city temperatures

Using Mathematica‘s function WeatherData we can get temperature time series for a small set of cities over a certain time grid. We can mix those time series into a multi-dimensional signal, MS, apply ICA to MS, and judge the extracted source signals with the original ones.

This is done with the following commands.

### Get time series data

cities = {"Sofia", "London", "Copenhagen"};
timeInterval = {{2016, 1, 1}, {2016, 1, 31}};
ts = WeatherData[#, "Temperature", timeInterval] & /@ cities;

opts = {PlotTheme -> "Detailed", FrameLabel -> {None, "temperature,\[Degree]C"}, ImageSize -> 350};
DateListPlot[ts,
PlotLabel -> "City temperatures\nfrom " <> DateString[timeInterval[[1]], {"Year", ".", "Month", ".", "Day"}] <>
" to " <> DateString[timeInterval[[2]], {"Year", ".", "Month", ".", "Day"}],
PlotLegends -> cities, ImageSize -> Large, opts]

### Cleaning and resampling (if any)

Here we check the data for missing data:

Length /@ Through[ts["Path"]]
Count[#, _Missing, \[Infinity]] & /@ Through[ts["Path"]]
Total[%]
(* {1483, 1465, 742} *)
(* {0,0,0} *)
(* 0 *)

Resampling per hour:

ts = TimeSeriesResample[#, "Hour", ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}] & /@ ts

### Mixing the time series

In order to do a good mixing we select a mixing matrix for which all column sums are close to one:

mixingMat = #/Total[#] & /@ RandomReal[1, {3, 3}];
MatrixForm[mixingMat]
(* mixingMat = {{0.357412, 0.403913, 0.238675}, {0.361481, 0.223506, 0.415013}, {0.36564, 0.278565, 0.355795}} *)
Total[mixingMat]
(* {1.08453, 0.905984, 1.00948} *)

Note the row normalization.

Make the mixed signals:

tsMixed = Table[TimeSeriesThread[mixingMat[[i]].# &, ts], {i, 3}]

Plot the original and mixed signals:

Grid[{{DateListPlot[ts, PlotLegends -> cities, PlotLabel -> "Original signals", opts],
DateListPlot[tsMixed, PlotLegends -> Automatic, PlotLabel -> "Mixed signals", opts]}}]

### Application of ICA

At this point we apply ICA (probably more than once, but not too many times) and plot the found source signals:

X = Transpose[Through[tsMixed["Path"]][[All, All, 2]] /. Quantity[v_, _] :> v];
{S, A} = IndependentComponentAnalysis[X, 3];
DateListPlot[Transpose[{tsMixed[[1]]["Dates"], #}], PlotTheme -> "Detailed", ImageSize -> 250] & /@ Transpose[S]

Compare with the original time series:

MapThread[DateListPlot[#1, PlotTheme -> "Detailed", PlotLabel -> #2, ImageSize -> 250] &, {tsPaths, cities}]

After permuting and inverting some of the found sources signals we see they are fairly good:

pinds = {3, 1, 2};
pmat = IdentityMatrix[3][[All, pinds]];

DateListPlot[Transpose[{tsMixed[[1]]["Dates"], #}], PlotTheme -> "Detailed", ImageSize -> 250] & /@
Transpose[S.DiagonalMatrix[{1, -1, 1}].pmat]

## References

[1] A. Hyvarinen and E. Oja (2000) Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5):411-430 . URL: https://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf .

[2] Wikipedia entry, Independent component analysis, URL: https://en.wikipedia.org/wiki/Independent_component_analysis .

[3] A. Antonov, Independent Component Analysis Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package IndependentComponentAnalysis.m .

[4] J. L. Marchini, C. Heaton and B. D. Ripley, fastICA, R package, URLs: https://cran.r-project.org/web/packages/fastICA/index.html, https://cran.r-project.org/web/packages/fastICA/fastICA.pdf .

[5] A. Antonov, Implementation of the Non-Negative Matrix Factorization algorithm in Mathematica, (2013), source code at MathematicaForPrediction at GitHub, package NonNegativeMatrixFactorization.m.

[6] H. Hsieh and J. Chien, A new nonnegative matrix factorization for independent component analysis, (2010), Conference: Acoustics Speech and Signal Processing (ICASSP).

# Comparison of PCA and NNMF over image de-noising

This post compares two dimension reduction techniques Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a set of images with two different classes of signals (two digits below) produced by different generators and overlaid with different types of noise.

PCA/SVD produces somewhat good results, but NNMF often provides great results. The factors of NNMF allow interpretation of the basis vectors of the dimension reduction, PCA in general does not. This can be seen in the example below.

## Data

First let us get some images. I am going to use the MNIST dataset for clarity. (And because I experimented with similar data some time ago — see Classification of handwritten digits.).

MNISTdigits = ExampleData[{"MachineLearning", "MNIST"}, "TestData"];
{testImages, testImageLabels} =
Transpose[List @@@ RandomSample[Cases[MNISTdigits, HoldPattern[(im_ -> 0 | 4)]], 100]];
testImages

See the breakdown of signal classes:

Tally[testImageLabels]
(* {{4, 48}, {0, 52}} *)

Verify the images have the same sizes:

Tally[ImageDimensions /@ testImages]
dims = %[[1, 1]]

Add different kinds of noise to the images:

noisyTestImages6 =
Table[ImageEffect[
testImages[[i]],
{RandomChoice[{"GaussianNoise", "PoissonNoise", "SaltPepperNoise"}], RandomReal[1]}], {i, Length[testImages]}];

Since the important values of the signals are 0 or close to 0 we negate the noisy images:

## Linear vector space representation

We unfold the images into vectors and stack them into a matrix:

noisyTestImagesMat = (Flatten@*ImageData) /@ negNoisyTestImages6;
Dimensions[noisyTestImagesMat]

(* {100, 784} *)

Here is centralized version of the matrix to be used with PCA/SVD:

cNoisyTestImagesMat =
Map[# - Mean[noisyTestImagesMat] &, noisyTestImagesMat];

(With NNMF we want to use the non-centralized one.)

Here confirm the values in those matrices:

Grid[{Histogram[Flatten[#], 40, PlotRange -> All,
ImageSize -> Medium] & /@ {noisyTestImagesMat,
cNoisyTestImagesMat}}]

## SVD dimension reduction

For more details see the previous answers.

{U, S, V} = SingularValueDecomposition[cNoisyTestImagesMat, 100];
ListPlot[Diagonal[S], PlotRange -> All, PlotTheme -> "Detailed"]
dS = S;
Do[dS[[i, i]] = 0, {i, Range[10, Length[S], 1]}]
newMat = U.dS.Transpose[V];
denoisedImages =
Map[Image[Partition[# + Mean[noisyTestImagesMat], dims[[2]]]] &, newMat];

Here are how the new basis vectors look like:

Take[#, 50] &@
Dimensions[V][[2]]], Diagonal[S], Transpose[V]}]

Basically, we cannot tell much from these SVD basis vectors images.

Here we load the packages for what is computed next:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/NonNegativeMatrixFactorization.m"]

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/OutlierIdentifiers.m"]

The blog post “Statistical thesaurus from NPR podcasts” discusses an example application of NNMF and has links to documents explaining the theory behind the NNMF utilization.

The outlier detection package is described in a previous blog post “Outlier detection in a list of numbers”.

## NNMF

This command factorizes the image matrix into the product W H :

{W, H} = GDCLS[noisyTestImagesMat, 20, "MaxSteps" -> 200];
{W, H} = LeftNormalizeMatrixProduct[W, H];

Dimensions[W]
Dimensions[H]

(* {100, 20} *)
(* {20, 784} *)

The rows of H are interpreted as new basis vectors and the rows of W are the coordinates of the images in that new basis. Some appropriate normalization was also done for that interpretation. Note that we are using the non-normalized image matrix.

Let us see the norms of $H$ and mark the top outliers:

norms = Norm /@ H;
ListPlot[norms, PlotRange -> All, PlotLabel -> "Norms of H rows",
PlotTheme -> "Detailed"] //
ColorPlotOutliers[TopOutliers@*HampelIdentifierParameters]
OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters]
OutlierPosition[norms, TopOutliers@*SPLUSQuartileIdentifierParameters]

Here is the interpretation of the new basis vectors (the outliers are marked in red):

MapIndexed[{#2[[1]], Norm[#], Image[Partition[#, dims[[1]]]]} &, H] /. (# -> Style[#, Red] & /@
OutlierPosition[norms, TopOutliers@*HampelIdentifierParameters])

Using only the outliers of $H$ let us reconstruct the image matrix and the de-noised images:

pos = {1, 6, 10}
dHN = Total[norms]/Total[norms[[pos]]]*
DiagonalMatrix[
ReplacePart[ConstantArray[0, Length[norms]],
Map[List, pos] -> 1]];
newMatNNMF = W.dHN.H;
denoisedImagesNNMF =
Map[Image[Partition[#, dims[[2]]]] &, newMatNNMF];

Often we cannot just rely on outlier detection and have to hand pick the basis for reconstruction. (Especially when we have more than one classes of signals.)

## Comparison

At this point we can plot all images together for comparison:

imgRows =
Transpose[{testImages, noisyTestImages6,
With[{ncol = 5},
Grid[Prepend[Partition[imgRows, ncol],
Style[#, Blue, FontFamily -> "Times"] & /@
Table[{"original", "noised", "SVD", "NNMF"}, ncol]]]]

We can see that NNMF produces cleaner images. This can be also observed/confirmed using threshold binarization — the NNMF images are much cleaner.

imgRows =
With[{th = 0.5},
Binarize[#4, th]} &, {testImageLabels, noisyTestImages6,
With[{ncol = 5},
Grid[Prepend[Partition[imgRows, ncol],
Style[#, Blue, FontFamily -> "Times"] & /@
Table[{"label", "noised", "SVD", "NNMF"}, ncol]]]]

Usually with NNMF in order to get good results we have to do more that one iteration of the whole factorization and reconstruction process. And of course NNMF is much slower. Nevertheless, we can see clear advantages of NNMF’s interpretability and leveraging it.

## Gallery with other experiments

In those experiments I had to hand pick the NNMF basis used for the reconstruction. Using outlier detection without supervision would not produce good results most of the time.

## Further comparison with Classify

We can further compare the de-noising results by building signal (digit) classifiers and running them over the de-noised images.

For such a classifier we have to decide:

1. do we train only with images of the two signal classes or over a larger set of signals;
2. how many signals we train with;
3. with what method the classifiers are built.

Below I use the default method of Classify with all digit images in MNIST that are not in the noised images set. The classifier is run with the de-noising traced these 0-4 images:

Get the images:

{trainImages, trainImageLabels} = Transpose[List @@@ Cases[MNISTdigits, HoldPattern[(im_ -> _)]]];
pred = Map[! MemberQ[testImages, #] &, trainImages];
trainImages = Pick[trainImages, pred];
trainImageLabels = Pick[trainImageLabels, pred];
Tally[trainImageLabels]

(* {{0, 934}, {1, 1135}, {2, 1032}, {3, 1010}, {4, 928}, {5, 892}, {6, 958}, {7, 1028}, {8, 974}, {9, 1009}} *)

Build the classifier:

digitCF = Classify[trainImages -> trainImageLabels]

Compute classifier measurements for the original, the PCA de-noised, and NNMF de-noised images:

origCM = ClassifierMeasurements[digitCF, Thread[(testImages) -> testImageLabels]]

pcaCM = ClassifierMeasurements[digitCF,

nnmfCM = ClassifierMeasurements[digitCF,

Tabulate the classification measurements results:

Grid[{{"Original", "PCA", "NNMF"},
{Row[{"Accuracy:", origCM["Accuracy"]}], Row[{"Accuracy:", pcaCM["Accuracy"]}], Row[{"Accuracy:", nnmfCM["Accuracy"]}]},
{origCM["ConfusionMatrixPlot"], pcaCM["ConfusionMatrixPlot"], nnmfCM["ConfusionMatrixPlot"]}},
Dividers -> {None, {True, True, False}}]

# Finding outliers in 2D and 3D numerical data

## Introduction

This blog post describes a method of finding outliers in 2D and 3D data using Quantile Regression Envelopes discussed in previous blog posts: “Directional quantile envelopes”, “Directional quantile envelopes in 3D”.

## Data

In order to provide a good example of the method application it would be better to use “real life” data. I found this one: “UCI Online Retail Data Set”. That dataset has columns for online purchase transactions (quantity and price).

(*data=Import["http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx","XLSX"];*)

so I downloaded the XLSX file and saved it into a CSV file, then imported it:

data = Import["~/Datasets/UCI Online Retail Data Set/Online Retail.csv",
"IgnoreEmptyLines" -> True, "HeaderLines" -> 0];
columnNames = data[[1]];
data = Rest[data];

Here are the dimensions of the dataset (seems “realistically” large):

Dimensions[data]
(* {65499, 8} *)

Here is a summary of the quantative columns:

Grid[{RecordsSummary[N@data[[All, {4, 6}]], columnNames[[{4, 6}]]]}, Dividers -> All]

Since we have only two quantative columns in the data let us add a third one, and make it have 3 outliers.

dvec = RandomReal[SkewNormalDistribution[1, 2, 4], Dimensions[data][[1]]];
Block[{inds = RandomSample[Range[Length[dvec]], 3]},
dvec[[inds]] = 10*dvec[[inds]]];
testData = MapThread[Append, {N@data[[All, {4, 6}]], dvec}];

Here is the summary:

Grid[{RecordsSummary[testData]}, Dividers -> All]

### Standardizing

It is a good idea to standardize the data. This is not necessary for the outlier finding procedure used below, but it makes the data more convenient for visualization or other exploration.

sTestData = Transpose[Standardize /@ Transpose[N@testData]];

Because of the outliers plotting the data might produce uninformative plots. We can use logarithms of the point coordinates and for that we have to shift the standardized data to be positive. This is done with this command:

Block[{offset = -2 (Min /@ Transpose[sTestData])},
sTestData = Map[# + offset &, sTestData]];

Let use show the standardized data summary and visualize:

Grid[{RecordsSummary[sTestData]}, Dividers -> All]

opts = {PlotRange -> All, ImageSize -> Medium,
PlotTheme -> "Detailed"}; Grid[{{ListPointPlot3D[sTestData, opts],
ListPointPlot3D[Log10@sTestData, opts]}}]

## Using Quantile Regression envelopes to find outliers

We are going to find the outliers by computing envelopes around the dataset points that contain almost all points (e.g. 99.7% of them).

The finding of directional quantile envelopes in 2D and 3D is explained in these blog posts:

I am a big fan of Quantile Regression (QR) and I have implemented a collection of functions and applications of QR. See these blog posts.

This command imports the package QuantileRegression.m :

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/QuantileRegression.m"]

Here is an example of an envelope found using directional quantiles over a small sample of the points:

Block[{testData = RandomSample[sTestData, 2000], qreg},
qreg = QuantileEnvelopeRegion[testData, 0.997, 10];
Show[{Graphics3D[{Red, Point[testData]}, Axes -> True],
BoundaryDiscretizeRegion[qreg]}]
]

The outlier points (in red) are outside of the envelope.

The example (and plot) above are just for illustration purposes. We calculate a quantile evelope region using all points.

Block[{testData = sTestData},
qreg = QuantileEnvelopeRegion[testData, 0.9997, 10];
]

(The function QuantileEnvelopeRegion works also for 2D. For 2D nothing else changes in the steps below except using 2D graphics.)

This command makes a function to test does a point belong to the found envelope region or not:

rmFunc = RegionMember[qreg];

This calculates the membership predicates for all points:

AbsoluteTiming[
pred = rmFunc /@ sTestData;
]

(* {15.6485, Null} *)

And we can see the membership breakdown:

Tally[pred]
(* {{True, 65402}, {False, 97}} *)

and visualize it (using Pick and taking logarithms):

Graphics3D[{Gray, Point[Log10@Pick[sTestData, pred]], Red,
Point[Log10@Pick[sTestData, Not /@ pred]]}, Axes -> True]

The plot above contains both top and bottom outliers (in red). If we are intereseted only in the top outliers we can find these thresholds:

topThresholds = Quantile[#, 0.95] & /@ Transpose[testData]
(* {25, 10.95, 4.89433} *)

and use them to select the top outliers:

Select[Pick[testData, Not /@ pred],
Total[Thread[# > topThresholds] /. {True -> 1, False -> 0}] > 1 &]

(* {{1, 836.14, 6.48564}, {1, 16.13, 8.71455}, {5, 25.49, 8.47351}, {-1, 1126, 5.25211}, {1000, 0, 5.4212}, {1, 15.79, 9.14674}, {-1, 544.4, 6.86266}} *)

Note that the quantile regression envelope and the membership predicates were computed over the standardized data, and the predicates were used to retrieve the outliers of the original data.