In this presentation we discuss the application of different dimension reduction algorithms over collections of random mandalas. We discuss and compare the derived image bases and show how those bases explain the underlying collection structure. The presented techniques and insights (1) are applicable to any collection of images, and (2) can be included in larger, more complicated machine learning workflows. The former is demonstrated with a handwritten digits recognition
application; the latter with the generation of random Bethlehem stars. The (parallel) walk-through of the core demonstration is in all three programming languages: Mathematica, Python, and R.

In this document we describe the design and demonstrate the implementation of a (software programming) monad, [Wk1], for Epidemiology Compartmental Modeling (ECM) workflows specification and execution. The design and implementation are done with Mathematica / Wolfram Language (WL). A very similar implementation is also done in R.

Monad’s name is “ECMMon”, which stands for “Epidemiology Compartmental Modeling Monad”, and its monadic implementation is based on the State monad package “StateMonadCodeGenerator.m”, [AAp1, AA1], ECMMon is implemented in the package [AAp8], which relies on the packages [AAp3-AAp6]. The original ECM workflow discussed in [AA5] was implemented in [AAp7]. An R implementation of ECMMon is provided by the package [AAr2].

The goal of the monad design is to make the specification of ECM workflows (relatively) easy and straightforward by following a certain main scenario and specifying variations over that scenario.

We use real-life COVID-19 data, The New York Times COVID-19 data, see [NYT1, AA5].

The monadic programming design is used as a Software Design Pattern. The ECMMon monad can be also seen as a Domain Specific Language (DSL) for the specification and programming of epidemiological compartmental modeling workflows.

Here is an example of using the ECMMon monad over a compartmental model with two types of infected populations:

The table above is produced with the package “MonadicTracing.m”, [AAp2, AA1], and some of the explanations below also utilize that package.

As it was mentioned above the monad ECMMon can be seen as a DSL. Because of this the monad pipelines made with ECMMon are sometimes called “specifications”.

Contents description

The document has the following structure.

The sections “Package load” and “Data load” obtain the needed code and data.

Remark: One can read only the sections “Introduction”, “Design consideration”, “Single-site models”, and “Batch simulation and calibration process”. That set of sections provide a fairly good, programming language agnostic exposition of the substance and novel ideas of this document.

Package load

In this section we load packages used in this notebook:

(Note that in the plots above we have to reverse the keys of the given population associations.)

Using the function HextileHistogram from [AAp7 ] here we visualize USA county populations over a hexagonal grid with cell radius 2 degrees ((\approx 140) miles (\approx 222) kilometers):

In this notebook we prefer using HextileHistogram because it represents the simulation data in geometrically more faithful way.

Design considerations

The big picture

The main purpose of the designed epidemic compartmental modeling framework (i.e. software monad) is to have the ability to do multiple, systematic simulations for different scenario play-outs over large scale geographical regions. The target end-users are decision makers at government level and researchers of pandemic or other large scale epidemic effects.

Here is a diagram that shows the envisioned big picture workflow:

Large-scale modeling

The standard classical compartmental epidemiology models are not adequate over large geographical areas, like, countries. We design a software framework – the monad ECMMon – that allows large scale simulations using a simple principle workflow:

Develop a single-site model for relatively densely populated geographical area for which the assumptions of the classical models (approximately) hold.

Extend the single-site model into a large-scale multi-site model using statistically derived traveling patterns; see [AA4].

Supply the multi-site model with appropriately prepared data.

Run multiple simulations to see large scale implications of different policies.

Calibrate the model to concrete observed (or faked) data. Go to 4.

Flow chart

The following flow chart visualizes the possible workflows the software monad ECMMon:

Two models in the monad

An ECMMon object can have one or two models. One of the models is a “seed”, single-site model from [AAp1], which, if desired, is scaled into a multi-site model, [AA3, AAp2].

Workflows with only the single-site model are supported.

Say, workflows for doing sensitivity analysis, [AA6, BC1].

Scaling of a single-site model into multi-site is supported and facilitated.

Workflows for the multi-site model include preliminary model scaling steps and simulation steps.

After the single-site model is scaled the monad functions use the multi-site model.

The workflows should be easy to specify and read.

Single-site model workflow

Make a single-site model.

Assign stocks initial conditions.

Assign rates values.

Simulate.

Plot results.

Go to 2.

Multi-site model workflow

Make a single-site model.

Assign initial conditions and rates.

Scale the single-site model into a multi-site model.

The single-site assigned rates become “global” when the single-site model is scaled.

The scaling is based on assumptions for traveling patterns of the populations.

There are few alternatives for that scaling:

Using locations geo-coordinates

Using regular grids covering a certain area based on in-habited locations geo-coordinates

Using traveling patterns contingency matrices

Using “artificial” patterns of certain regular types for qualitative analysis purposes

Enhance the multi-site traveling patterns matrix and re-scale the single site model.

We might want to combine traveling patterns by ground transportation with traveling patterns by airplanes.

For quarantine scenarios this might a less important capability of the monad.

Hence, this an optional step.

Assign stocks initial conditions for each of the sites in multi-scale model.

Assign rates for each of the sites.

Simulate.

Plot global simulation results.

Plot simulation results for focus sites.

Single-site models

We have a collection of single-site models that have different properties and different modeling goals, [AAp3, AA7, AA8]. Here is as diagram of a single-site model that includes hospital beds and medical supplies as limitation resources, [AA7]:

SEI2HR model

In this sub-section we briefly describe the model SEI2HR, which is used in the examples below.

Detailed description of the SEI2HR model is given in [AA7].

Verbal description

We start with one infected (normally symptomatic) person, the rest of the people are susceptible. The infected people meet other people directly or get in contact with them indirectly. (Say, susceptible people touch things touched by infected.) For each susceptible person there is a probability to get the decease. The decease has an incubation period: before becoming infected the susceptible are (merely) exposed. The infected recover after a certain average infection period or die. A certain fraction of the infected become severely symptomatic. If there are enough hospital beds the severely symptomatic infected are hospitalized. The hospitalized severely infected have different death rate than the non-hospitalized ones. The number of hospital beds might change: hospitals are extended, new hospitals are build, or there are not enough medical personnel or supplies. The deaths from infection are tracked (accumulated.) Money for hospital services and money from lost productivity are tracked (accumulated.)

The equations below give mathematical interpretation of the model description above.

Equations

Here are the equations of one the epidemiology compartmental models, SEI2HR, [AA7], implemented in [AAp3]:

The equations for Susceptible, Exposed, Infected, Recovered populations of SEI2R are “standard” and explanations about them are found in [WK2, HH1]. For SEI2HR those equations change because of the stocks Hospitalized Population and Hospital Beds.

The equations time unit is one day. The time horizon is one year. In this document we consider COVID-19, [Wk2, AA1], hence we do not consider births.

Single-site model workflow demo

In this section we demonstrate some of the sensitivity analysis discussed in [AA6, BC1].

Theoretical and computational details about the multi-site workflow can be found in [AA4, AA5].

Batch simulations and calibration processes

In this section we describe the in general terms the processes of model batch simulations and model calibration. The next two sections give more details of the corresponding software design and workflows.

Definitions

Batch simulation: If given a SD model (M), the set (P) of parameters of (M), and a set (B) of sets of values (P), (B\text{:=}\left{V_i\right}), then the set of multiple runs of (M) over (B) are called batch simulation.

Calibration: If given a model (M), the set (P) of parameters of (M), and a set of (k) time series (T\text{:=}\left{T_i\right}_{i=1}^k) that correspond to the set of stocks (S\text{:=}\left{S_i\right}_{i=1}^k) of (M) then the process of finding concrete the values (V) for (P) that make the stocks (S) to closely resemble the time series (T) according to some metric is called calibration of (M) over the targets(T).

Roles

There are three types of people dealing with the models:

Modeler, who develops and implements the model and prepares it for calibration.

Calibrator, who calibrates the model with different data for different parameters.

Stakeholder, who requires different features of the model and outcomes from different scenario play-outs.

There are two main calibration scenarios:

Modeler and Calibrator are the same person

Modeler and Calibrator are different persons

Process

Model development and calibration is most likely going to be an iterative process.

For concreteness let us assume that the model has matured development-wise and batch simulation and model calibration is done in a (more) formal way.

Here are the steps of a well defined process between the modeling activity players described above:

Stakeholder requires certain scenarios to be investigated.

Modeler prepares the model for those scenarios.

Stakeholder and Modeler formulate a calibration request.

Calibrator uses the specifications from the calibration request to:

Calibrate the model

Derive model outcomes results

Provide model qualitative results

Provide model sensitivity analysis results

Modeler (and maybe Stakeholder) review the results and decides should more calibration be done.

I.e. go to 3.

Modeler does batch simulations with the calibrated model for the investigation scenarios.

Modeler and Stakeholder prepare report with the results.

See the documents [AA9, AA10] have questionnaires that further clarify the details of interaction between the modelers and calibrators.

Batch simulation vs calibration

In order to clarify the similarities and differences between batch simulation and calibration we list the following observations:

Each batch simulation or model calibration is done either for model development purposes or for scenario play-out studies.

Batch simulation is used for qualitative studies of the model. For example, doing sensitivity analysis; see [BC1, AA7, AA8].

Before starting the calibration we might want to study the “landscape” of the search space of the calibration parameters using batch simulations.

Batch simulation is also done after model calibration in order to evaluate different scenarios,

For some models with large computational complexity batch simulation – together with some evaluation metric – can be used instead of model calibration.

Batch simulations workflow

In this section we describe the specification and execution of model batch simulations.

Batch simulations can be time consuming, hence it is good idea to

In the rest of the section we go through the following steps:

Make a model object

Batch simulate over a few combinations of parameters and show:

Plots of the simulation results for all populations

Plots of the simulation results for a particular population

Batch simulate over the Cartesian (outer) product of values lists of a selected pair of parameters and show the corresponding plots of all simulations

Instead of specifying an the combinations of parameters directly we can specify the values taken by each parameter using an association in which the keys are parameters and the values are list of values:

In this section we go through the computation steps of the calibration of single-site SEI2HR model.

Remark: We use real data in this section, but the presented calibration results and outcome plots are for illustration purposes only. A rigorous study with discussion of the related assumptions and conclusions is beyond the scope of this notebook/document.

Calibration steps

Here are the steps performed in the rest of the sub-sections of this section:

Ingest data for infected cases, deaths due to disease, etc.

Choose a model to calibrate.

Make the calibration targets – those a vectors corresponding to time series over regular grids.

Consider using all of the data in order to evaluate model’s applicability.

Consider using fractions of the data in order to evaluate model’s ability to predict the future or reconstruct data gaps.

Choose calibration parameters and corresponding ranges for their values.

If more than one target choose the relative weight (or importance) of the targets.

Calibrate the model.

Evaluate the fitting between the simulation results and data.

Using statistics and plots.

Make conclusions. If insufficiently good results are obtained go to 2 or 4.

Remark: When doing calibration epidemiological models a team of people it is better certain to follow (rigorously) well defined procedures. See the documents:

Remark: We plan to prepare have several notebooks dedicated to calibration of both single-site and multi-site models.

USA COVID-19 data

Here data for the USA COVID-19 infection cases and deaths from [NYT1] (see [AA6] data ingestion details):

Using different minimization methods and distance functions

In the monad the calibration of the models is done with NMinimize. Hence, the monad function ECMMonCalibrate takes all options of NMinimize and can do calibrations with the same data and parameter search space using different global minima finding methods and distance functions.

Remark:EuclideanDistance is an obvious distance function, but use others like infinity norm and sum norm. Also, we can use a distance function that takes parts of the data. (E.g. between days 50 and 150 because the rest of the data is, say, unreliable.)

We see the that with the calibration found parameter values the model can fit the data for the first 200 days, after that it overestimates the evolution of the infected and deceased popupulations.

We can conjecture that:

The model is too simple, hence inadequate

That more complicated quarantine policy functions have to be used

That the calibration process got stuck in some local minima

Future plans

In this section we outline some of the directions in which the presented work on ECMMon can be extended.

More unit tests and random unit tests

We consider the preparation and systematic utilization of unit tests to be a very important component of any software development. Unit tests are especially important when complicated software package like ECMMon are developed.

For the presented software monad (and its separately developed, underlying packages) have implemented a few collections of tests, see [AAp10, AAp11].

We plan to extend and add more complicated unit tests that test for both quantitative and qualitative behavior. Here are some examples for such tests:

Stock-vs-stock orbits produced by simulations of certain epidemic models

Expected theoretical relationships between populations (or other stocks) for certain initial conditions and rates

Wave-like propagation of the proportions of the infected populations in multi-site models over artificial countries and traveling patterns

Finding of correct parameter values with model calibration over different data (both artificial and real life)

Expected number of equations for different model set-ups

Expected (relative) speed of simulations with respect to model sizes

Further for the monad ECMMon we plant to develop random pipeline unit tests as the ones in [AAp12] for the classification monad ClCon, [AA11].

More comprehensive calibration guides and documentation

We plan to produce more comprehensive guides for doing calibration with ECMMon and in general with Mathematica’s NDSolve and NMinimize functions.

Full correspondence between the Mathematica and R implementations

The ingredients of the software monad ECMMon and ECMMon itself were designed and implemented in Mathematica first. The corresponding design and implementation was done in R, [AAr2]. To distinguish the two implementations we call the R one ECMMon-R and Mathematica (Wolfram Language) one ECMMon-WL.

At this point the calibration is not implemented in ECMMon-R, but we plan to do that soon.

(With Mathematica similar interactive interfaces are presented in [AA7, AA8].)

Model transfer between Mathematica and R

We are very interested in transferring epidemiological models from Mathematica to R (or Python, or Julia.)

This can be done in two principle ways: (i) using Mathematica expressions parsers, or (ii) using matrix representations. We plan to investigate the usage of both approaches.

Conversational agent

Consider the making of a conversational agent for epidemiology modeling workflows building. Initial design and implementation is given in [AA13, AA14].

Consider the following epidemiology modeling workflow specification:

lsCommands = "create with SEI2HR;assign 100000 to total population;set infected normally symptomatic population to be 0;set infected severely symptomatic population to be 1;assign 0.56 to contact rate of infected normally symptomatic population;assign 0.58 to contact rate of infected severely symptomatic population;assign 0.1 to contact rate of the hospitalized population;simulate for240 days;plot populations results;calibrate for target DIPt -> tsDeaths, over parameters contactRateISSP in from 0.1 to 0.7;echo pipeline value";

Here is the ECMMon code generated using the workflow specification:

[JS1] John D.Sterman, Business Dynamics: Systems Thinking and Modeling for a Complex World. (2000), New York: McGraw.

[HH1] Herbert W. Hethcote (2000). “The Mathematics of Infectious Diseases”. SIAM Review. 42 (4): 599–653. Bibcode:2000SIAMR..42..599H. doi:10.1137/s0036144500371907.

This document/notebook is inspired by the Mathematica Stack Exchange (MSE) question “Plotting the Star of Bethlehem”, [MSE1]. That MSE question requests efficient and fast plotting of a certain mathematical function that (maybe) looks like the Star of Bethlehem, [Wk1]. Instead of doing what the author of the questions suggests, I decided to use a generative art program and workflows from three of most important Machine Learning (ML) sub-cultures: Latent Semantic Analysis, Recommendations, and Classification.

Although we discuss making of Bethlehem Star-like images, the ML workflows and corresponding code presented in this document/notebook have general applicability – in many situations we have to make classifiers based on data that has to be “feature engineered” through pipeline of several types of ML transformative workflows and that feature engineering requires multiple iterations of re-examinations and tuning in order to achieve the set goals.

The document/notebook is structured as follows:

Target Bethlehem Star images

Simplistic approach

Elaborated approach outline

Sections that follow through elaborated approach outline:

Remark: The plot above looks prettier in notebook converted with the resource function DarkMode.

Elaborated approach

Assume that we want to automate the simplistic approach described in the previous section.

One way to automate is to create a Machine Learning (ML) classifier that is capable of discerning which RandomMandala objects look like Bethlehem Star target images and which do not. With such a classifier we can write a function BethlehemMandala that applies the classifier on multiple results from RandomMandala and returns those mandalas that the classifier says are good.

Here are the steps of building the proposed classifier:

Generate a large enough Random Mandala Images Set (RMIS)

Create a feature extractor from a subset of RMIS

Assign features to all of RMIS

Make a recommender with the RMIS features and other image data (like pixel values)

Apply the RMIS recommender over the target Bethlehem Star images and determine and examine image sets that are:

the best recommendations

the worse recommendations

With the best and worse recommendations sets compose training data for classifier making

Train a classifier

Examine classifier application to (filtering of) random mandala images (both in RMIS and not in RMIS)

If the results are not satisfactory redo some or all of the steps above

Remark: If the results are not satisfactory we should consider using the obtained classifier at the data generation phase. (This is not done in this document/notebook.)

Remark: The elaborated approach outline and flow chart have general applicability, not just for generation of random images of a certain type.

Flow chart

Here is a flow chart that corresponds to the outline above:

A few observations for the flow chart follow:

The flow chart has a feature extraction block that shows that the feature extraction can be done in several ways.

The application of LSA is a type of feature extraction which this document/notebook uses.

If the results are not good enough the flow chart shows that the classifier can be used at the data generation phase.

If the results are not good enough there are several alternatives to redo or tune the ML algorithms.

Changing or tuning the recommender implies training a new classifier.

Changing or tuning the feature extraction implies making a new recommender and a new classifier.

Data generation and preparation

In this section we generate random mandala graphics, transform them into images and corresponding vectors. Those image-vectors can be used to apply dimension reduction algorithms. (Other feature extraction algorithms can be applied over the images.)

Remark: Note the weights assigned to the pixels and the topics in the recommender object above. Those weights were derived by examining the recommendations results shown below.

Here is the image we want to find most similar mandala images to – the target image:

Remark: Note that although a higher rotational symmetry order is used the highly scored results still seem relevant – they have the features of the target Bethlehem Star images.

The lectures on Latent Semantic Analysis (LSA) are to be recorded through Wolfram University (Wolfram U) in December 2019 and January-February 2020.

The lectures (as live-coding sessions)

Overview Latent Semantic Analysis (LSA) typical problems and basic workflows.
Answering preliminary anticipated questions.
Here is the recording of the first session at Twitch .

What are the typical applications of LSA?

Why use LSA?

What it the fundamental philosophical or scientific assumption for LSA?

What is the most important and/or fundamental step of LSA?

What is the difference between LSA and Latent Semantic Indexing (LSI)?

What are the alternatives?

Using Neural Networks instead?

How is LSA used to derive similarities between two given texts?

How is LSA used to evaluate the proximity of phrases? (That have different words, but close semantic meaning.)

In this document we describe the design and implementation of a (software programming) monad, [Wk1], for Latent Semantic Analysis workflows specification and execution. The design and implementation are done with Mathematica / Wolfram Language (WL).

What is Latent Semantic Analysis (LSA)? : A statistical method (or a technique) for finding relationships in natural language texts that is based on the so called Distributional hypothesis, [Wk2, Wk3]. (The Distributional hypothesis can be simply stated as “linguistic items with similar distributions have similar meanings”; for an insightful philosophical and scientific discussion see [MS1].) LSA can be seen as the application of Dimensionality reduction techniques over matrices derived with the Vector space model.

The goal of the monad design is to make the specification of LSA workflows (relatively) easy and straightforward by following a certain main scenario and specifying variations over that scenario.

The data for this document is obtained from WL’s repository and it is manipulated into a certain ready-to-utilize form (and uploaded to GitHub.)

The monadic programming design is used as a Software Design Pattern. The LSAMon monad can be also seen as a Domain Specific Language (DSL) for the specification and programming of machine learning classification workflows.

Here is an example of using the LSAMon monad over a collection of documents that consists of 233 US state of union speeches.

The table above is produced with the package “MonadicTracing.m”, [AAp2, AA1], and some of the explanations below also utilize that package.

As it was mentioned above the monad LSAMon can be seen as a DSL. Because of this the monad pipelines made with LSAMon are sometimes called “specifications”.

Remark: In this document with “term” we mean “a word, a word stem, or other type of token.”

Remark: LSA and Latent Semantic Indexing (LSI) are considered more or less to be synonyms. I think that “latent semantic analysis” sounds more universal and that “latent semantic indexing” as a name refers to a specific Information Retrieval technique. Below we refer to “LSI functions” like “IDF” and “TF-IDF” that are applied within the generic LSA workflow.

Contents description

The document has the following structure.

The sections “Package load” and “Data load” obtain the needed code and data.

The sections “Design consideration” and “Monad design” provide motivation and design decisions rationale.

The sections “LSAMon overview”, “Monad elements”, and “The utilization of SSparseMatrix objects” provide technical descriptions needed to utilize the LSAMon monad .

(Using a fair amount of examples.)

The section “Unit tests” describes the tests used in the development of the LSAMon monad.

(The random pipelines unit tests are especially interesting.)

The section “Future plans” outlines future directions of development.

The section “Implementation notes” just says that LSAMon’s development process and this document follow the ones of the classifications workflows monad ClCon, [AA6].

Remark: One can read only the sections “Introduction”, “Design consideration”, “Monad design”, and “LSAMon overview”. That set of sections provide a fairly good, programming language agnostic exposition of the substance and novel ideas of this document.

Package load

The following commands load the packages [AAp1–AAp7]:

In this section we load data that is used in the rest of the document. The text data was obtained through WL’s repository, transformed in a certain more convenient form, and uploaded to GitHub.

The text summarization and plots are done through LSAMon, which in turn uses the function RecordsSummary from the package “MathematicaForPredictionUtilities.m”, [AAp7].

In some of the examples below we want to explicitly specify the stop words. Here are stop words derived using the built-in functions DictionaryLookup and DeleteStopwords.

Here is a quote from [Wk1] that fairly well describes why we choose to make a classification workflow monad and hints on the desired properties of such a monad.

[…] The monad represents computations with a sequential structure: a monad defines what it means to chain operations together. This enables the programmer to build pipelines that process data in a series of steps (i.e. a series of actions applied to the data), in which each action is decorated with the additional processing rules provided by the monad. […]

Monads allow a programming style where programs are written by putting together highly composable parts, combining in flexible ways the possible actions that can work on a particular type of data. […]

Remark: Note that quote from [Wk1] refers to chained monadic operations as “pipelines”. We use the terms “monad pipeline” and “pipeline” below.

Monad design

The monad we consider is designed to speed-up the programming of LSA workflows outlined in the previous section. The monad is named LSAMon for “Latent Semantic Analysis** Mon**ad”.

We want to be able to construct monad pipelines of the general form:

LSAMon is based on the State monad, [Wk1, AA1], so the monad pipeline form (1) has the following more specific form:

This means that some monad operations will not just change the pipeline value but they will also change the pipeline context.

In the monad pipelines of LSAMon we store different objects in the contexts for at least one of the following two reasons.

The object will be needed later on in the pipeline, or

The object is (relatively) hard to compute.

Such objects are document-term matrix, Dimensionality reduction factors and the related topics.

Let us list the desired properties of the monad.

Rapid specification of non-trivial LSA workflows.

The text data supplied to the monad can be: (i) a list of strings, or (ii) an association with string values.

The monad uses the Linear vector space model .

The document-term frequency matrix can be created after removing stop words and/or word stemming.

It is easy to specify and apply different LSI weight functions. (Like “IDF” or “GFIDF”.)

The monad can do dimension reduction with SVD and NNMF and corresponding matrix factors are retrievable with monad functions.

Documents (or query strings) external to the monad are easily mapped into monad’s Linear vector space of terms and Linear vector space of topics.

The monad allows of cursory examination and summarization of the data.

The pipeline values can be of different types. (Most monad functions modify the pipeline value; some modify the context; some just echo results.)

It is easy to obtain the pipeline value, context, and different context objects for manipulation outside of the monad.

It is easy to tabulate extracted topics and related statistical thesauri.

The LSAMon components and their interactions are fairly simple.

The main LSAMon operations implicitly put in the context or utilize from the context the following objects:

document-term matrix,

the factors obtained by matrix factorization algorithms,

LSI weight functions specifications,

extracted topics.

Note the that the monadic set of types of LSAMon pipeline values is fairly heterogenous and certain awareness of “the current pipeline value” is assumed when composing LSAMon pipelines.

Obviously, we can put in the context any object through the generic operations of the State monad of the package “StateMonadGenerator.m”, [AAp1].

LSAMon overview

When using a monad we lift certain data into the “monad space”, using monad’s operations we navigate computations in that space, and at some point we take results from it.

With the approach taken in this document the “lifting” into the LSAMon monad is done with the function LSAMonUnit. Results from the monad can be obtained with the functions LSAMonTakeValue, LSAMonContext, or with the other LSAMon functions with the prefix “LSAMonTake” (see below.)

Here is a corresponding diagram of a generic computation with the LSAMon monad:

Remark: It is a good idea to compare the diagram with formulas (1) and (2).

Let us examine a concrete LSAMon pipeline that corresponds to the diagram above. In the following table each pipeline operation is combined together with a short explanation and the context keys after its execution.

Here is the output of the pipeline:

The LSAMon functions are separated into four groups:

operations,

setters and droppers,

takers,

State Monad generic functions.

Monad functions interaction with the pipeline value and context

An overview of the those functions is given in the tables in next two sub-sections. The next section, “Monad elements”, gives details and examples for the usage of the LSAMon operations.

State monad functions

Here are the LSAMon State Monad functions (generated using the prefix “LSAMon”, [AAp1, AA1].)

Main monad functions

Here are the usage descriptions of the main (not monad-supportive) LSAMon functions, which are explained in detail in the next section.

Monad elements

In this section we show that LSAMon has all of the properties listed in the previous section.

The monad head

The monad head is LSAMon. Anything wrapped in LSAMon can serve as monad’s pipeline value. It is better though to use the constructor LSAMonUnit. (Which adheres to the definition in [Wk1].)

The fundamental model of LSAMon is the so called Vector space model (or the closely related Bag-of-words model.) The document-term matrix is a linear vector space representation of the documents collection. That representation is further used in LSAMon to find topics and statistical thesauri.

Here is an example of ad hoc construction of a document-term matrix using a couple of paragraphs from “Hamlet”.

When we construct the document-term matrix we (often) want to stem the words and (almost always) want to remove stop words. LSAMon’s function LSAMonMakeDocumentTermMatrix makes the document-term matrix and takes specifications for stemming and stop words.

After making the document-term matrix we will most likely apply LSI weight functions, [Wk2], like “GFIDF” and “TF-IDF”. (This follows the “standard” approach used in search engines for calculating weights for document-term matrices; see [MB1].)

Frequency matrix

We use the following definition of the frequency document-term matrix F.

Each entry f_{ij} of the matrix F is the number of occurrences of the term j in the document i.

Weights

Each entry of the weighted document-term matrix M derived from the frequency document-term matrix F is expressed with the formula

where g_{j} – global term weight; l_{ij} – local term weight; d_{i} – normalization weight.

Various formulas exist for these weights and one of the challenges is to find the right combination of them when using different document collections.

Here is a table of weight functions formulas.

Computation specifications

LSAMon function LSAMonApplyTermWeightFunctions delegates the LSI weight functions application to the package “DocumentTermMatrixConstruction.m”, [AAp4].

Here we are summaries of the non-zero values of the weighted document-term matrix derived with different combinations of global, local, and normalization weight functions.

Streamlining topic extraction is one of the main reasons LSAMon was implemented. The topic extraction correspond to the so called “syntagmatic” relationships between the terms, [MS1].

Theoretical outline

The original weighed document-term matrix M is decomposed into the matrix factors W and H.

M ≈ W.H, W ∈ ℝ^{m × k}, H ∈ ℝ^{k × n}.

The i-th row of M is expressed with the i-th row of W multiplied by H.

The rows of H are the topics. SVD produces orthogonal topics; NNMF does not.

The i-the document of the collection corresponds to the i-th row W. Finding the Nearest Neighbors (NN’s) of the i-th document using the rows similarity of the matrix W gives document NN’s through topic similarity.

The terms correspond to the columns of H. Finding NN’s based on similarities of H’s columns produces statistical thesaurus entries.

The term groups provided by H’s rows correspond to “syntagmatic” relationships. Using similarities of H’s columns we can produce term clusters that correspond to “paradigmatic” relationships.

Computation specifications

Here is an example using the play “Hamlet” in which we specify additional stop words.

One of the most natural operations is to find the representation of an arbitrary document (or sentence or a list of words) in monad’s Linear vector space of terms. This is done with the function LSAMonRepresentByTerms.

Here is an example in which a sentence is represented as a one-row matrix (in that space.)

obj =
lsaHamlet⟹
LSAMonRepresentByTerms["Hamlet, Prince of Denmark killed the king."]⟹
LSAMonEchoValue;

Here we display only the non-zero columns of that matrix.

obj⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];

Transformation steps

Assume that LSAMonRepresentByTerms is given a list of sentences. Then that function performs the following steps.

1. The sentence is split into a list of words.

2. If monad’s document-term matrix was made by removing stop words the same stop words are removed from the list of words.

3. If monad’s document-term matrix was made by stemming the same stemming rules are applied to the list of words.

4. The LSI global weights and the LSI local weight and normalizer functions are applied to sentence’s contingency matrix.

Equivalent representation

Let us look convince ourselves that documents used in the monad to built the weighted document-term matrix have the same representation as the corresponding rows of that matrix.

Here is an association of documents from monad’s document collection.

inds = {6, 10};
queries = Part[lsaHamlet⟹LSAMonTakeDocuments, inds];
queries
(* <|"id.0006" -> "Getrude, Queen of Denmark, mother to Hamlet. Ophelia, daughter to Polonius.",
"id.0010" -> "ACT I. Scene I. Elsinore. A platform before the Castle."|> *)
lsaHamlet⟹
LSAMonRepresentByTerms[queries]⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];

Another natural operation is to find the representation of an arbitrary document (or a list of words) in monad’s Linear vector space of topics. This is done with the function LSAMonRepresentByTopics.

Here is an example.

inds = {6, 10};
queries = Part[lsaHamlet⟹LSAMonTakeDocuments, inds];
Short /@ queries
(* <|"id.0006" -> "Getrude, Queen of Denmark, mother to Hamlet. Ophelia, daughter to Polonius.",
"id.0010" -> "ACT I. Scene I. Elsinore. A platform before the Castle."|> *)
lsaHamlet⟹
LSAMonRepresentByTopics[queries]⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];

In LSAMon for SVD H^{( − 1)} = H^{T}; for NNMF H^{( − 1)} is the pseudo-inverse of H.

The vector x obtained with LSAMonRepresentByTopics.

Tags representation

Sometimes we want to find the topics representation of tags associated with monad’s documents and the tag-document associations are one-to-many. See [AA3].

Let us consider a concrete example – we want to find what topics correspond to the different presidents in the collection of State of Union speeches.

Here we find the document tags (president names in this case.)

There are several algorithms we can apply for finding the most important documents in the collection. LSAMon utilizes two types algorithms: (1) graph centrality measures based, and (2) matrix factorization based. With certain graph centrality measures the two algorithms are equivalent. In this sub-section we demonstrate the matrix factorization algorithm (that uses SVD.)

Definition: The most important sentences have the most important words and the most important words are in the most important sentences.

That definition can be used to derive an iterations-based model that can be expressed with SVD or eigenvector finding algorithms, [LE1].

Here we pick an important part of the play “Hamlet”.

focusText =
First@Pick[textHamlet, StringMatchQ[textHamlet, ___ ~~ "to be" ~~ __ ~~ "or not to be" ~~ ___, IgnoreCase -> True]];
Short[focusText]
(* "Ham. To be, or not to be- that is the question: Whether 'tis ....y.
O, woe is me T' have seen what I have seen, see what I see!" *)
LSAMonUnit[StringSplit[ToLowerCase[focusText], {",", ".", ";", "!", "?"}]]⟹
LSAMonMakeDocumentTermMatrix["StemmingRules" -> {}, "StopWords" -> Automatic]⟹
LSAMonApplyTermWeightFunctions⟹
LSAMonFindMostImportantDocuments[3]⟹
LSAMonEchoFunctionValue[GridTableForm];

Setters, droppers, and takers

The values from the monad context can be set, obtained, or dropped with the corresponding “setter”, “dropper”, and “taker” functions as summarized in a previous section.

For example:

p = LSAMonUnit[textHamlet]⟹LSAMonMakeDocumentTermMatrix[Automatic, Automatic];
p⟹LSAMonTakeMatrix

If other values are put in the context they can be obtained through the (generic) function LSAMonTakeContext, [AAp1]:

Short@(p⟹QRMonTakeContext)["documents"]
(* <|"id.0001" -> "1604", "id.0002" -> "THE TRAGEDY OF HAMLET, PRINCE OF DENMARK", <<220>>, "id.0223" -> "THE END"|> *)

Another generic function from [AAp1] is LSAMonTakeValue (used many times above.)

Here is an example of the “data dropper” LSAMonDropDocuments:

(The “droppers” simply use the state monad function LSAMonDropFromContext, [AAp1]. For example, LSAMonDropDocuments is equivalent to LSAMonDropFromContext[“documents”].)

The utilization of SSparseMatrix objects

The LSAMon monad heavily relies on SSparseMatrix objects, [AAp6, AA5], for internal representation of data and computation results.

A SSparseMatrix object is a matrix with named rows and columns.

In some cases we want to show only columns of the data or computation results matrices that have non-zero elements.

Here is an example (similar to other examples in the previous section.)

lsaHamlet⟹
LSAMonRepresentByTerms[{"this country is rotten",
"where is my sword my lord",
"poison in the ear should be in the play"}]⟹
LSAMonEchoFunctionValue[ MatrixForm[#1[[All, Keys[Select[ColumnSumsAssociation[#1], #1 > 0 &]]]]] &];

In the pipeline code above: (i) from the list of queries a representation matrix is made, (ii) that matrix is assigned to the pipeline value, (iii) in the pipeline echo value function the non-zero columns are selected with by using the keys of the non-zero elements of the association obtained with ColumnSumsAssociation.

Similarities based on representation by terms

Here is way to compute the similarity matrix of different sets of documents that are not required to be in monad’s document collection.

Similarly to weighted Boolean similarities matrix computation above we can compute a similarity matrix using the topics representations. Note that an additional normalization steps is required.

Note the differences with the weighted Boolean similarity matrix in the previous sub-section – the similarities that are less than 1 are noticeably larger.

Unit tests

The development of LSAMon was done with two types of unit tests: (i) directly specified tests, [AAp7], and (ii) tests based on randomly generated pipelines, [AA8].

The unit test package should be further extended in order to provide better coverage of the functionalities and illustrate – and postulate – pipeline behavior.

Since the monad LSAMon is a DSL it is natural to test it with a large number of randomly generated “sentences” of that DSL. For the LSAMon DSL the sentences are LSAMon pipelines. The package “MonadicLatentSemanticAnalysisRandomPipelinesUnitTests.m”, [AAp9], has functions for generation of LSAMon random pipelines and running them as verification tests. A short example follows.

AbsoluteTiming[
res = TestRunLSAMonPipelines[pipelines, "Echo" -> False];
]

From the test report results we see that a dozen tests failed with messages, all of the rest passed.

rpTRObj = TestReport[res]

(The message failures, of course, have to be examined – some bugs were found in that way. Currently the actual test messages are expected.)

Future plans

Dimension reduction extensions

It would be nice to extend the Dimension reduction functionalities of LSAMon to include other algorithms like Independent Component Analysis (ICA), [Wk5]. Ideally with LSAMon we can do comparisons between SVD, NNMF, and ICA like the image de-nosing based comparison explained in [AA8].

Another direction is to utilize Neural Networks for the topic extraction and making of statistical thesauri.

Conversational agent

Since LSAMon is a DSL it can be relatively easily interfaced with a natural language interface.

Here is an example of natural language commands parsed into LSA code using the package [AAp13].

Implementation notes

The implementation methodology of the LSAMon monad packages [AAp3, AAp9] followed the methodology created for the ClCon monad package [AAp10, AA6]. Similarly, this document closely follows the structure and exposition of the `ClCon monad document “A monad for classification workflows”, [AA6].

A lot of the functionalities and signatures of LSAMon were designed and programed through considerations of natural language commands specifications given to a specialized conversational agent.

This document shows a way to chart in Mathematica / WL the evolution of topics in collections of texts. The making of this document (and related code) is primarily motivated by the fascinating concept of the Great Conversation, [Wk1, MA1]. In brief, all western civilization books are based on great ideas; if we find the great ideas each significant book is based on we can construct a time-line (spanning centuries) of the great conversation between the authors; see [MA1, MA2, MA3].

The presented computational results are based on the text collections of State of the Union speeches of USA presidents [D2]. The code in this document can be easily configured to use the much smaller text collection [D1] available online and in Mathematica/WL. (The collection [D1] is fairly small, documents; the collection [D2] is much larger, documents.)

The procedures (and code) described in this document, of course, work on other types of text collections. For example: movie reviews, podcasts, editorial articles of a magazine, etc.

A secondary objective of this document is to illustrate the use of the monadic programming pipeline as a Software design pattern, [AA3]. In order to make the code concise in this document I wrote the package MonadicLatentSemanticAnalysis.m, [AAp5]. Compare with the code given in [AA1].

The very first version of this document was written for the 2017 summer course “Data Science for the Humanities” at the University of Oxford, UK.

Outline of the procedure applied

The procedure described in this document has the following steps.

Get a collection of documents with known dates of publishing.

Or other types of tags associated with the documents.

Do preliminary analysis of the document collection.

Number of documents; number of unique words.

Number of words per document; number of documents per word.

(Some of the statistics of this step are done easier after the Linear vector space representation step.)

Optionally perform Natural Language Processing (NLP) tasks.

In this section we load a text collection from a specified source.

The text collection from “Presidential Nomination Acceptance Speeches”, [D1], is small and can be used for multiple code verifications and re-runnings. The “State of Union addresses of USA presidents” text collection from [D2] was converted to a Mathematica/WL object by Christopher Wolfram (and sent to me in a private communication.) The text collection [D2] provides far more interesting results (and they are shown below.)

If[True,
speeches = ResourceData[ResourceObject["Presidential Nomination Acceptance Speeches"]];
names = StringSplit[Normal[speeches[[All, "Person"]]][[All, 2]], "::"][[All, 1]],
(*ELSE*)
(*State of the union addresses provided by Christopher Wolfram. *)
Get["~/MathFiles/Digital humanities/Presidential speeches/speeches.mx"];
names = Normal[speeches[[All, "Name"]]];
];
dates = Normal[speeches[[All, "Date"]]];
texts = Normal[speeches[[All, "Text"]]];
Dimensions[speeches]
(* {2453, 4} *)

Basic statistics for the texts

Using different contingency matrices we can derive basic statistical information about the document collection. (The document-word matrix is a contingency matrix.)

We can complete this list with additional stop words derived from the collection itself. (Not done here.)

Linear vector space representation and dimension reduction

Remark: In the rest of the document we use “term” to mean “word” or “stemmed word”.

The following code makes a document-term matrix from the document collection, exaggerates the representations of the terms using “TF-IDF”, and then does topic extraction through dimension reduction. The dimension reduction is done with NNMF; see [AAp3, AA1, AA2].

Let us clarify the values by briefly describing the computational steps.

From texts we derive the document-term matrix , where is the number of documents and is the number of terms.

The terms are words or stemmed words.

This is done with LSAMonMakeDocumentTermMatrix.

From docTermMat is derived the (weighted) matrix wDocTermMat using “TF-IDF”.

This is done with LSAMonApplyTermWeightFunctions.

Using docTermMat we find the terms that are present in sufficiently large number of documents and their column indices are assigned to topicColumnPositions.

Matrix factorization.

Assign to , , where .

Compute using NNMF the factorization , where , , and is the number of topics.

The values for the keys “W, “H”, and “topicColumnPositions” are computed and assigned by LSAMonTopicExtraction.

From the top terms of each topic are derived automatic topic names and assigned to the key automaticTopicNames in the monad context.

Also done by LSAMonTopicExtraction.

Statistical thesaurus

At this point in the object mObj we have the factors of NNMF. Using those factors we can find a statistical thesaurus for a given set of words. The following code calculates such a thesaurus, and echoes it in a tabulated form.

By observing the thesaurus entries we can see that the words in each entry are semantically related.

Note, that the word “welfare” strongly associates with “[applause]”. The rest of the query words do not, which can be seen by examining larger thesaurus entries:

The function LSAMonTopicsRepresentation finds the top outliers for each row of NNMF’s left factor . (The outliers are found using the package [AAp4].) The obtained list of indices gives the topic representation of the collection of texts.

Further we can see that if the documents have tags associated with them — like author names or dates — we can make a contingency matrix of tags vs topics. (See [AAp8, AA4].) This is also done by the function LSAMonTopicsRepresentation that takes tags as an argument. If the tags argument is Automatic, then the tags are simply the document indices.

Note that the matrix plots above are very close to the charting of the Great conversation that we are looking for. This can be made more obvious by observing the row names and columns names in the tabulation of the transposed matrix rsmat:

Magnify[#, 0.6] &@MatrixForm[Transpose[rsmat]]

Charting the great conversation

In this section we show several ways to chart the Great Conversation in the collection of speeches.

There are several possible ways to make the chart: using a time-line plot, using heat-map plot, and using appropriate tabulation (with MatrixForm or Grid).

In order to make the code in this section more concise the package RSparseMatrix.m, [AAp7, AA5], is used.

Topic name to topic words

This command makes an Association between the topic names and the top topic words.

Note the value of the option DistanceFunction: there is not re-ordering of the rows and columns are reordered by sorting. Also, the topics on the horizontal names have tool-tips.

[MA1] Mortimer Adler, "The Great Conversation Revisited," in The Great Conversation: A Peoples Guide to Great Books of the Western World, Encyclopædia Britannica, Inc., Chicago,1990, p. 28.

This document discusses concrete algorithms for two different approaches of generation of mandala images, [1]: direct construction with graphics primitives, and use of machine learning algorithms.

to show some pretty images exploiting symmetry and multiplicity (see this album),

to provide an illustrative example of comparing dimension reduction methods,

to give a set-up for further discussions and investigations on mandala creation with machine learning algorithms.

Two direct construction algorithms are given: one uses "seed" segment rotations, the other superimposing of layers of different types. The following plots show the order in which different mandala parts are created with each of the algorithms.

In this document we use several algorithms for dimension reduction applied to collections of images following the procedure described in [4,5]. We are going to show that with Non-Negative Matrix Factorization (NNMF) we can use mandalas made with the seed segment rotation algorithm to extract layer types and superimpose them to make colored mandalas. Using the same approach with Singular Value Decomposition (SVD) or Independent Component Analysis (ICA) does not produce good layers and the superimposition produces more "watered-down", less diverse mandalas.

From a more general perspective this document compares the statistical approach of "trying to see without looking" with the "direct simulation" approach. Another perspective is the creation of "design spaces"; see [6].

The idea of using machine learning algorithms is appealing because there is no need to make the mental effort of understanding, discerning, approximating, and programming the principles of mandala creation. We can "just" use a large collection of mandala images and generate new ones using the "internal knowledge" data of machine learning algorithms. For example, a Neural network system like Deep Dream, [2], might be made to dream of mandalas.

Direct algorithms for mandala generation

In this section we present two different algorithms for generating mandalas. The first sees a mandala as being generated by rotation of a "seed" segment. The second sees a mandala as being generated by different component layers. For other approaches see [3].

The request of [3] is for generation of mandalas for coloring by hand. That is why the mandala generation algorithms are in the grayscale space. Coloring the generated mandala images is a secondary task.

By seed segment rotations

One way to come up with mandalas is to generate a segment and then by appropriate number of rotations to produce a mandala.

Here is a function and an example of random segment (seed) generation:

Here is a more concise way to generate symmetric segment mandalas:

Multicolumn[Table[Image@MakeMandala[], {12}], 5]

Note that with this approach the programming of the mandala coloring is not that trivial — weighted blending of colorized mandalas is the easiest thing to do. (Shown below.)

"For this one I’ve defined three types of layer, a flower, a simple circle and a ring of small circles. You could add more for greater variety."

The coloring approach with image blending given below did not work well for this algorithm, so I modified the original code in order to produce colored mandalas.

The most interesting results are obtained with the image blending procedure coded below over mandala images generated with the seed segment rotation algorithm.

In this section we are going to apply the dimension reduction algorithms Singular Value Decomposition (SVD), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF) to a linear vector space representation (a matrix) of an image dataset. In the next section we use the bases generated by those algorithms to make mandala images.
We are going to use the packages [7,8] for ICA and NNMF respectively.

The linear vector space representation of the images is simple — each image is flattened to a vector (row-wise), and the image vectors are put into a matrix.

The SVD basis has an average mandala image as its first vector and the other vectors are "differences" to be added to that first vector.

The SVD and ICA bases are structured similarly. That is because ICA and SVD are both based on orthogonality — ICA factorization uses an orthogonality criteria based on Gaussian noise properties (which is more relaxed than SVD’s standard orthogonality criteria.)

As expected, the NNMF basis images have black background because of the enforced non-negativity. (Black corresponds to 0, white to 1.)

Compared to the SVD and ICA bases the images of the NNMF basis are structured in a radial manner. This can be demonstrated using image binarization.

We can see that binarizing of the NNMF basis images shows them as mandala layers. In other words, using NNMF we can convert the mandalas of the seed segment rotation algorithm into mandalas generated by an algorithm that superimposes layers of different types.

Blending with image bases samples

In this section we just show different blending images using the SVD, ICA, and NNMF bases.

What would be the outcomes of the above procedures to mandala images found in the World Wide Web (WWW) ?

Those WWW images are most likely man made or curated.

The short answer is that the results are not that good. Better results might be obtained using a larger set of WWW images (than just 100 in the experiment results shown below.)

Here is a sample from the WWW mandala images:

Here are the results obtained with NNMF basis:

Future plans

My other motivation for writing this document is to set up a basis for further investigations and discussions on the following topics.

Having a large image database of "real world", human made mandalas.

Utilization of Neural Network algorithms to mandala creation.

Utilization of Cellular Automata to mandala generation.

Investigate mandala morphing and animations.

Making a domain specific language of specifications for mandala creation and modification.

The idea of using machine learning algorithms for mandala image generation was further supported by an image classifier that recognizes fairly well (suitably normalized) mandala images obtained in different ways:

In this document are given outlines and examples of several related implementations of Lebesgue integration, [1], within the framework of NIntegrate, [7]. The focus is on the implementations of Lebesgue integration algorithms that have multiple options and can be easily extended (in order to do further research, optimization, etc.) In terms of NIntegrate‘s framework terminology it is shown how to implement an integration strategy or integration rule based on the theory of the Lebesgue integral. The full implementation of those strategy and rules — LebesgueIntegration, LebesgueIntegrationRule, and GridLebesgueIntegrationRule — are given in the Mathematica package [5].

The advantage of using NIntegrate‘s framework is that a host of supporting algorithms can be employed for preprocessing, execution, experimentation, and testing (correctness, comparison, and profiling.)

Here is a brief description of the integration strategy LebesgueIntegration in [5]:

prepare a function that calculates measure estimates based on random points or low discrepancy sequences of points in the integration domain;

use NIntegrate for the computation of one dimensional integrals for that measure estimate function over the range of the integrand function values.

The strategy is adaptive because of the second step — NIntegrate uses adaptive integration algorithms.

Instead of using an integration strategy we can "tuck in" the whole Lebesgue integration process into an integration rule, and then use that integration rule with the adaptive integration algorithms NIntegrate already has. This is done with the implementations of the integration rules LebesgueIntegrationRule and GridLebesgueIntegrationRule.

Brief theory

Lebesgue integration extends the definition of integral to a much larger class of functions than the class of Riemann integrable functions. The Riemann integral is constructed by partitioning the integrand’s domain (on the axis). The Lebesgue integral is constructed by partitioning the integrand’s co-domain (on the axis). For each value in the co-domain, find the measure of the corresponding set of points in the domain. Roughly speaking, the Lebesgue integral is then the sum of all the products ; see [1]. For our implementation purposes is defined differently, and in the rest of this section we follow [3].

Consider the non-negative bound-able measurable function :

We denote by the measure for the points in for which , i.e.

The Lebesgue integral of over can be be defined as:

Further, we can write the last formula as

The restriction can be handled by defining the following functions and :

and using the formula

Since finding analytical expressions of is hard we are going to look into ways of approximating .

For more details see [1,2,3,4].

(Note, that the theoretical outline the algorithms considered can be seen as algorithms that reduce multidimensional integration to one dimensional integration.)

Algorithm walk through

We can see that because of Equation (4) we mostly have to focus on estimating the measure function . This section provides a walk through with visual examples of a couple of stochastic ways to do that.

Consider the integral

In order to estimate in for we are going to generate in a set of low discrepancy sequence of points, [2]. Here this is done with points of the so called "Sobol" sequence:

To each point let us assign a corresponding "volume" that can be used to approximate with Equation (2). We can of course easily assign such volumes to be , but as it can be seen on the plot this would be a good approximation for a larger number of points. Here is an example of a different volume assignment using a Voronoi diagram, [10]:

In order to implement the outlined algorithm so it will be more universal we have to consider volumes rescaling, function positivity, and Voronoi diagram implementation(s). For details how these considerations are resolved see the code of the strategy LebesgueIntegration in [5].

Further algorithm elaborations

Measure estimate with regular grid cells

The article 3 and book 4 suggest the measure estimation to be done through membership of regular grid of cells. For example, the points generated in the previous section can be grouped by a grid:

The following steps describe in detail an algorithm based on the proposed in [3,4] measure estimation method. The algorithm is implemented in [5] for the symbol, GridLebesgueIntegrationRule.

1. Generate points filling the where is the dimension of .

2. Partition with a regular grid according to specifications.

2.1. Assume the cells are indexed with the integers , .

2.2. Assume that all cells have the same volume below.

3. For each point find to which cell of the regular grid it belongs to.

4. For each cell have a list of indices corresponding to the points that belong to it.

5. For a given sub-region of integration rescale to the points to lie within ; denote those points with .

5.1. Compute the rescaling factor for the integration rule; denote with .

6. For a given integrand function evaluate over all points .

7. For each cell find the min and max values of .

7.1. Denote with and correspondingly.

8. For a given value , where is some integer enumerating the 1D integration rule sampling points, find the coefficients , using the following formula:

9. Find the measure estimate of with

Axis splitting in Lebesgue integration rules

The implementations of Lebesgue integration rules are required to provide a splitting axis for use of the adaptive algorithms. Of course we can assign a random splitting axis, but that might lead often to slower computations. One way to provide splitting axis is to choose the axis that minimizes the sum of the variances of sub-divided regions estimated by samples of the rule points. This is the same approach taken in NIntegrate`s rule "MonteCarloRule"; for theoretical details see the chapter "7.8 Adaptive and Recursive Monte Carlo Methods" of [11].

In [5] this splitting axis choice is implemented based on integrand function values. Another approach, more in the spirit of the Lebesgue integration, is to select the splitting axis based on variances of the measure function estimates.

The strategy LebesgueIntegration uses an internal variable for the calculation of the Lebesgue integral. In "EvaluationMonitor" either that variable has to be used, or a symbol name has to be passed though the option "LebesgueIntegralVariableSymbol". Here is an example:

We can use NIntegrate‘s utility functions for visualization and profiling in order to do comparison of the implemented algorithms with related ones (like "AdaptiveMonteCarlo") which NIntegrate has (or are plugged-in).

The flow charts below show that the plug-in designs have common elements. In order to make the computations more effective the rule initialization prepares the data that is used in all rule invocations. For the strategy the initialization can be much lighter since the strategy algorithm is executed only once.

In the flow charts the double line boxes designate sub-routines. We can see that so called Hollywood principle "don’t call us, we’ll call you" in Object-oriented programming is manifested.

The following flow chart shows the steps of NIntegrate‘s execution when the integration strategy LebesgueIntegration is used.

The following flow chart shows the steps of NIntegrate‘s execution when the integration rule LebesgueIntegrationRule is used.

Algorithms versions and options

There are multiple architectural, coding, and interface decisions to make while doing implementations like the ones in [5] and described in this document. The following mind map provides an overview of alternatives and interactions between components and parameters.

Alternative implementations

In many ways using the Lebesgue integration rule with the adaptive algorithms is similar to using NIntegrate‘s "AdaptiveMonteCarlo" and its rule "MonteCarloRule". Although it is natural to consider plugging-in the Lebesgue integration rules into "AdaptiveMonteCarlo" at this point NIntegrate‘s framework does not allow "AdaptiveMonteCarlo" the use of a rule different than "MonteCarloRule".

We can consider using Monte Carlo algorithms for estimating the measures corresponding to a vector of values (that come from a 1D integration rule). This can be easily done, but it is not that effective because of the way NIntegrate handles vector integrands and because of stopping criteria issues when the measures are close to .

Future plans

One of the most interesting extensions of the described Lebesgue integration algorithms and implementation designs is their extension with more advanced features of Mathematica for geometric computation. (Like the functions VoronoiMesh, RegionMeasure, and ImplicitRegion used above.)

Another interesting direction is the derivation and use of symbolic expressions for the measure functions. (Hybrid symbolic and numerical algorithms can be obtained as NIntegrate‘s handling of piecewise functions or the strategy combining symbolic and numerical integration described in [9].)

In a previous blog post, [1], I compared Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a collection of noised images of digit handwriting from the MNIST data set, [3], which is available in Mathematica.

This blog post adds to that comparison the use of Independent Component Analysis (ICA) proclaimed in my previous blog post, [1].

Computations

The ICA related additional computations to those in [1] follow.

Independent Component Analysis (ICA) is a (matrix factorization) method for separation of a multi-dimensional signal (represented with a matrix) into a weighted sum of sub-components that have less entropy than the original variables of the signal. See [1,2] for introduction to ICA and more details.

This blog post is to proclaim the implementation of the "FastICA" algorithm in the package IndependentComponentAnalysis.m and show a basic application with it. (I programmed that package last weekend. It has been in my ToDo list to start ICA algorithms implementations for several months… An interesting offshoot was the procedure I derived for the StackExchange question "Extracting signal from Gaussian noise".)

In this blog post ICA is going to be demonstrated with both generated data and "real life" weather data (temperatures of three cities within one month).

Generated data

In order to demonstrate ICA let us make up some data in the spirit of the "cocktail party problem".

It is important to note that the usual ICA model interpretation for the factorized matrix X is that each column is a variable (audio signal) and each row is an observation (recordings of the microphones at a given time). The matrix 3×1201 M was constructed with the interpretation that each row is a signal, hence we have to transpose M in order to apply the ICA algorithms, X=M^T.

X = Transpose[M];
{S, A} = IndependentComponentAnalysis[X, 3];

Check the approximation of the obtained factorization:

Norm[X - S.A]
(* 3.10715*10^-14 *)

Plot the found source signals:

Grid[{Map[ListLinePlot[#, PlotRange -> All, ImageSize -> 250] &,
Transpose[S]]}]

Because of the random initialization of the inverting matrix in the algorithm the result my vary. Here is the plot from another run:

The package also provides the function FastICA that returns an association with elements that correspond to the result of the function fastICA provided by the R package "fastICA". See [4].

Note that (in adherence to [4]) the function FastICA returns the matrices S and A for the centralized matrix X. This means, for example, that in order to check the approximation proper mean has to be supplied:

The result of the function IndependentComponentAnalysis is a list of two matrices. The result of FastICA is an association of the matrices obtained by ICA. The function IndependentComponentAnalysis takes a method option and options for precision goal and maximum number of steps:

The intent is IndependentComponentAnalysis to be the front interface to different ICA algorithms. (Hence, it has a Method option.) The function FastICA takes as options the named arguments of the R function fastICA described in [4].

At this point FastICA has only the deflation algorithm described in [1]. ([4] has also the so called "symmetric" ICA sub-algorithm.) The R function fastICA in [4] can use only two neg-Entropy functions log(cosh(x)) and exp(-u^2/2). Because of the symbolic capabilities of MathematicaFastICA of [3] can take any listable function through the option "NonGaussianityFunction", and it will find and use the corresponding first and second derivatives.

Using NNMF for ICA

It seems that in some cases, like the generated data used in this blog post, Non-Negative Matrix Factorization (NNMF) can be applied for doing ICA.

To be clear, NNMF does dimension reduction, but its norm minimization process does not enforce variable independence. (It enforces non-negativity.) There are at least several articles discussing modification of NNMF to do ICA. For example [6].

For the generated data in this blog post, FactICA is much faster than NNMF and produces better separation of the signals with every run. The data though is a typical representative for the problems ICA is made for. Another comparison with image de-noising, extending my previous blog post, will be published shortly.

ICA for mixed time series of city temperatures

Using Mathematica‘s function WeatherData we can get temperature time series for a small set of cities over a certain time grid. We can mix those time series into a multi-dimensional signal, MS, apply ICA to MS, and judge the extracted source signals with the original ones.