The lectures on Latent Semantic Analysis (LSA) are to be recorded through Wolfram University (Wolfram U) in December 2019 and January-February 2020.
The lectures (as live-coding sessions)
Overview Latent Semantic Analysis (LSA) typical problems and basic workflows.
Answering preliminary anticipated questions.
Here is the recording of the first session at Twitch .
What are the typical applications of LSA?
Why use LSA?
What it the fundamental philosophical or scientific assumption for LSA?
What is the most important and/or fundamental step of LSA?
What is the difference between LSA and Latent Semantic Indexing (LSI)?
What are the alternatives?
Using Neural Networks instead?
How is LSA used to derive similarities between two given texts?
How is LSA used to evaluate the proximity of phrases? (That have different words, but close semantic meaning.)
In this document we describe the design and implementation of a (software programming) monad, [Wk1], for Latent Semantic Analysis workflows specification and execution. The design and implementation are done with Mathematica / Wolfram Language (WL).
What is Latent Semantic Analysis (LSA)? : A statistical method (or a technique) for finding relationships in natural language texts that is based on the so called Distributional hypothesis, [Wk2, Wk3]. (The Distributional hypothesis can be simply stated as “linguistic items with similar distributions have similar meanings”; for an insightful philosophical and scientific discussion see [MS1].) LSA can be seen as the application of Dimensionality reduction techniques over matrices derived with the Vector space model.
The goal of the monad design is to make the specification of LSA workflows (relatively) easy and straightforward by following a certain main scenario and specifying variations over that scenario.
The data for this document is obtained from WL’s repository and it is manipulated into a certain ready-to-utilize form (and uploaded to GitHub.)
The monadic programming design is used as a Software Design Pattern. The LSAMon monad can be also seen as a Domain Specific Language (DSL) for the specification and programming of machine learning classification workflows.
Here is an example of using the LSAMon monad over a collection of documents that consists of 233 US state of union speeches.
The table above is produced with the package “MonadicTracing.m”, [AAp2, AA1], and some of the explanations below also utilize that package.
As it was mentioned above the monad LSAMon can be seen as a DSL. Because of this the monad pipelines made with LSAMon are sometimes called “specifications”.
Remark: In this document with “term” we mean “a word, a word stem, or other type of token.”
Remark: LSA and Latent Semantic Indexing (LSI) are considered more or less to be synonyms. I think that “latent semantic analysis” sounds more universal and that “latent semantic indexing” as a name refers to a specific Information Retrieval technique. Below we refer to “LSI functions” like “IDF” and “TF-IDF” that are applied within the generic LSA workflow.
Contents description
The document has the following structure.
The sections “Package load” and “Data load” obtain the needed code and data.
The sections “Design consideration” and “Monad design” provide motivation and design decisions rationale.
The sections “LSAMon overview”, “Monad elements”, and “The utilization of SSparseMatrix objects” provide technical descriptions needed to utilize the LSAMon monad .
(Using a fair amount of examples.)
The section “Unit tests” describes the tests used in the development of the LSAMon monad.
(The random pipelines unit tests are especially interesting.)
The section “Future plans” outlines future directions of development.
The section “Implementation notes” just says that LSAMon’s development process and this document follow the ones of the classifications workflows monad ClCon, [AA6].
Remark: One can read only the sections “Introduction”, “Design consideration”, “Monad design”, and “LSAMon overview”. That set of sections provide a fairly good, programming language agnostic exposition of the substance and novel ideas of this document.
Package load
The following commands load the packages [AAp1–AAp7]:
In this section we load data that is used in the rest of the document. The text data was obtained through WL’s repository, transformed in a certain more convenient form, and uploaded to GitHub.
The text summarization and plots are done through LSAMon, which in turn uses the function RecordsSummary from the package “MathematicaForPredictionUtilities.m”, [AAp7].
In some of the examples below we want to explicitly specify the stop words. Here are stop words derived using the built-in functions DictionaryLookup and DeleteStopwords.
Here is a quote from [Wk1] that fairly well describes why we choose to make a classification workflow monad and hints on the desired properties of such a monad.
[…] The monad represents computations with a sequential structure: a monad defines what it means to chain operations together. This enables the programmer to build pipelines that process data in a series of steps (i.e. a series of actions applied to the data), in which each action is decorated with the additional processing rules provided by the monad. […]
Monads allow a programming style where programs are written by putting together highly composable parts, combining in flexible ways the possible actions that can work on a particular type of data. […]
Remark: Note that quote from [Wk1] refers to chained monadic operations as “pipelines”. We use the terms “monad pipeline” and “pipeline” below.
Monad design
The monad we consider is designed to speed-up the programming of LSA workflows outlined in the previous section. The monad is named LSAMon for “Latent Semantic Analysis** Mon**ad”.
We want to be able to construct monad pipelines of the general form:
LSAMon is based on the State monad, [Wk1, AA1], so the monad pipeline form (1) has the following more specific form:
This means that some monad operations will not just change the pipeline value but they will also change the pipeline context.
In the monad pipelines of LSAMon we store different objects in the contexts for at least one of the following two reasons.
The object will be needed later on in the pipeline, or
The object is (relatively) hard to compute.
Such objects are document-term matrix, Dimensionality reduction factors and the related topics.
Let us list the desired properties of the monad.
Rapid specification of non-trivial LSA workflows.
The text data supplied to the monad can be: (i) a list of strings, or (ii) an association with string values.
The monad uses the Linear vector space model .
The document-term frequency matrix can be created after removing stop words and/or word stemming.
It is easy to specify and apply different LSI weight functions. (Like “IDF” or “GFIDF”.)
The monad can do dimension reduction with SVD and NNMF and corresponding matrix factors are retrievable with monad functions.
Documents (or query strings) external to the monad are easily mapped into monad’s Linear vector space of terms and Linear vector space of topics.
The monad allows of cursory examination and summarization of the data.
The pipeline values can be of different types. (Most monad functions modify the pipeline value; some modify the context; some just echo results.)
It is easy to obtain the pipeline value, context, and different context objects for manipulation outside of the monad.
It is easy to tabulate extracted topics and related statistical thesauri.
The LSAMon components and their interactions are fairly simple.
The main LSAMon operations implicitly put in the context or utilize from the context the following objects:
document-term matrix,
the factors obtained by matrix factorization algorithms,
LSI weight functions specifications,
extracted topics.
Note the that the monadic set of types of LSAMon pipeline values is fairly heterogenous and certain awareness of “the current pipeline value” is assumed when composing LSAMon pipelines.
Obviously, we can put in the context any object through the generic operations of the State monad of the package “StateMonadGenerator.m”, [AAp1].
LSAMon overview
When using a monad we lift certain data into the “monad space”, using monad’s operations we navigate computations in that space, and at some point we take results from it.
With the approach taken in this document the “lifting” into the LSAMon monad is done with the function LSAMonUnit. Results from the monad can be obtained with the functions LSAMonTakeValue, LSAMonContext, or with the other LSAMon functions with the prefix “LSAMonTake” (see below.)
Here is a corresponding diagram of a generic computation with the LSAMon monad:
Remark: It is a good idea to compare the diagram with formulas (1) and (2).
Let us examine a concrete LSAMon pipeline that corresponds to the diagram above. In the following table each pipeline operation is combined together with a short explanation and the context keys after its execution.
Here is the output of the pipeline:
The LSAMon functions are separated into four groups:
operations,
setters and droppers,
takers,
State Monad generic functions.
Monad functions interaction with the pipeline value and context
An overview of the those functions is given in the tables in next two sub-sections. The next section, “Monad elements”, gives details and examples for the usage of the LSAMon operations.
State monad functions
Here are the LSAMon State Monad functions (generated using the prefix “LSAMon”, [AAp1, AA1].)
Main monad functions
Here are the usage descriptions of the main (not monad-supportive) LSAMon functions, which are explained in detail in the next section.
Monad elements
In this section we show that LSAMon has all of the properties listed in the previous section.
The monad head
The monad head is LSAMon. Anything wrapped in LSAMon can serve as monad’s pipeline value. It is better though to use the constructor LSAMonUnit. (Which adheres to the definition in [Wk1].)
The fundamental model of LSAMon is the so called Vector space model (or the closely related Bag-of-words model.) The document-term matrix is a linear vector space representation of the documents collection. That representation is further used in LSAMon to find topics and statistical thesauri.
Here is an example of ad hoc construction of a document-term matrix using a couple of paragraphs from “Hamlet”.
When we construct the document-term matrix we (often) want to stem the words and (almost always) want to remove stop words. LSAMon’s function LSAMonMakeDocumentTermMatrix makes the document-term matrix and takes specifications for stemming and stop words.
After making the document-term matrix we will most likely apply LSI weight functions, [Wk2], like “GFIDF” and “TF-IDF”. (This follows the “standard” approach used in search engines for calculating weights for document-term matrices; see [MB1].)
Frequency matrix
We use the following definition of the frequency document-term matrix F.
Each entry f_{ij} of the matrix F is the number of occurrences of the term j in the document i.
Weights
Each entry of the weighted document-term matrix M derived from the frequency document-term matrix F is expressed with the formula
where g_{j} – global term weight; l_{ij} – local term weight; d_{i} – normalization weight.
Various formulas exist for these weights and one of the challenges is to find the right combination of them when using different document collections.
Here is a table of weight functions formulas.
Computation specifications
LSAMon function LSAMonApplyTermWeightFunctions delegates the LSI weight functions application to the package “DocumentTermMatrixConstruction.m”, [AAp4].
Here we are summaries of the non-zero values of the weighted document-term matrix derived with different combinations of global, local, and normalization weight functions.
Streamlining topic extraction is one of the main reasons LSAMon was implemented. The topic extraction correspond to the so called “syntagmatic” relationships between the terms, [MS1].
Theoretical outline
The original weighed document-term matrix M is decomposed into the matrix factors W and H.
M ≈ W.H, W ∈ ℝ^{m × k}, H ∈ ℝ^{k × n}.
The i-th row of M is expressed with the i-th row of W multiplied by H.
The rows of H are the topics. SVD produces orthogonal topics; NNMF does not.
The i-the document of the collection corresponds to the i-th row W. Finding the Nearest Neighbors (NN’s) of the i-th document using the rows similarity of the matrix W gives document NN’s through topic similarity.
The terms correspond to the columns of H. Finding NN’s based on similarities of H’s columns produces statistical thesaurus entries.
The term groups provided by H’s rows correspond to “syntagmatic” relationships. Using similarities of H’s columns we can produce term clusters that correspond to “paradigmatic” relationships.
Computation specifications
Here is an example using the play “Hamlet” in which we specify additional stop words.
One of the most natural operations is to find the representation of an arbitrary document (or sentence or a list of words) in monad’s Linear vector space of terms. This is done with the function LSAMonRepresentByTerms.
Here is an example in which a sentence is represented as a one-row matrix (in that space.)
obj =
lsaHamlet⟹
LSAMonRepresentByTerms["Hamlet, Prince of Denmark killed the king."]⟹
LSAMonEchoValue;
Here we display only the non-zero columns of that matrix.
obj⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];
Transformation steps
Assume that LSAMonRepresentByTerms is given a list of sentences. Then that function performs the following steps.
1. The sentence is split into a list of words.
2. If monad’s document-term matrix was made by removing stop words the same stop words are removed from the list of words.
3. If monad’s document-term matrix was made by stemming the same stemming rules are applied to the list of words.
4. The LSI global weights and the LSI local weight and normalizer functions are applied to sentence’s contingency matrix.
Equivalent representation
Let us look convince ourselves that documents used in the monad to built the weighted document-term matrix have the same representation as the corresponding rows of that matrix.
Here is an association of documents from monad’s document collection.
inds = {6, 10};
queries = Part[lsaHamlet⟹LSAMonTakeDocuments, inds];
queries
(* <|"id.0006" -> "Getrude, Queen of Denmark, mother to Hamlet. Ophelia, daughter to Polonius.",
"id.0010" -> "ACT I. Scene I. Elsinore. A platform before the Castle."|> *)
lsaHamlet⟹
LSAMonRepresentByTerms[queries]⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];
Another natural operation is to find the representation of an arbitrary document (or a list of words) in monad’s Linear vector space of topics. This is done with the function LSAMonRepresentByTopics.
Here is an example.
inds = {6, 10};
queries = Part[lsaHamlet⟹LSAMonTakeDocuments, inds];
Short /@ queries
(* <|"id.0006" -> "Getrude, Queen of Denmark, mother to Hamlet. Ophelia, daughter to Polonius.",
"id.0010" -> "ACT I. Scene I. Elsinore. A platform before the Castle."|> *)
lsaHamlet⟹
LSAMonRepresentByTopics[queries]⟹
LSAMonEchoFunctionValue[MatrixForm[Part[#, All, Keys[Select[SSparseMatrix`ColumnSumsAssociation[#], # > 0& ]]]]& ];
In LSAMon for SVD H^{( − 1)} = H^{T}; for NNMF H^{( − 1)} is the pseudo-inverse of H.
The vector x obtained with LSAMonRepresentByTopics.
Tags representation
Sometimes we want to find the topics representation of tags associated with monad’s documents and the tag-document associations are one-to-many. See [AA3].
Let us consider a concrete example – we want to find what topics correspond to the different presidents in the collection of State of Union speeches.
Here we find the document tags (president names in this case.)
There are several algorithms we can apply for finding the most important documents in the collection. LSAMon utilizes two types algorithms: (1) graph centrality measures based, and (2) matrix factorization based. With certain graph centrality measures the two algorithms are equivalent. In this sub-section we demonstrate the matrix factorization algorithm (that uses SVD.)
Definition: The most important sentences have the most important words and the most important words are in the most important sentences.
That definition can be used to derive an iterations-based model that can be expressed with SVD or eigenvector finding algorithms, [LE1].
Here we pick an important part of the play “Hamlet”.
focusText =
First@Pick[textHamlet, StringMatchQ[textHamlet, ___ ~~ "to be" ~~ __ ~~ "or not to be" ~~ ___, IgnoreCase -> True]];
Short[focusText]
(* "Ham. To be, or not to be- that is the question: Whether 'tis ....y.
O, woe is me T' have seen what I have seen, see what I see!" *)
LSAMonUnit[StringSplit[ToLowerCase[focusText], {",", ".", ";", "!", "?"}]]⟹
LSAMonMakeDocumentTermMatrix["StemmingRules" -> {}, "StopWords" -> Automatic]⟹
LSAMonApplyTermWeightFunctions⟹
LSAMonFindMostImportantDocuments[3]⟹
LSAMonEchoFunctionValue[GridTableForm];
Setters, droppers, and takers
The values from the monad context can be set, obtained, or dropped with the corresponding “setter”, “dropper”, and “taker” functions as summarized in a previous section.
For example:
p = LSAMonUnit[textHamlet]⟹LSAMonMakeDocumentTermMatrix[Automatic, Automatic];
p⟹LSAMonTakeMatrix
If other values are put in the context they can be obtained through the (generic) function LSAMonTakeContext, [AAp1]:
Short@(p⟹QRMonTakeContext)["documents"]
(* <|"id.0001" -> "1604", "id.0002" -> "THE TRAGEDY OF HAMLET, PRINCE OF DENMARK", <<220>>, "id.0223" -> "THE END"|> *)
Another generic function from [AAp1] is LSAMonTakeValue (used many times above.)
Here is an example of the “data dropper” LSAMonDropDocuments:
(The “droppers” simply use the state monad function LSAMonDropFromContext, [AAp1]. For example, LSAMonDropDocuments is equivalent to LSAMonDropFromContext[“documents”].)
The utilization of SSparseMatrix objects
The LSAMon monad heavily relies on SSparseMatrix objects, [AAp6, AA5], for internal representation of data and computation results.
A SSparseMatrix object is a matrix with named rows and columns.
In some cases we want to show only columns of the data or computation results matrices that have non-zero elements.
Here is an example (similar to other examples in the previous section.)
lsaHamlet⟹
LSAMonRepresentByTerms[{"this country is rotten",
"where is my sword my lord",
"poison in the ear should be in the play"}]⟹
LSAMonEchoFunctionValue[ MatrixForm[#1[[All, Keys[Select[ColumnSumsAssociation[#1], #1 > 0 &]]]]] &];
In the pipeline code above: (i) from the list of queries a representation matrix is made, (ii) that matrix is assigned to the pipeline value, (iii) in the pipeline echo value function the non-zero columns are selected with by using the keys of the non-zero elements of the association obtained with ColumnSumsAssociation.
Similarities based on representation by terms
Here is way to compute the similarity matrix of different sets of documents that are not required to be in monad’s document collection.
Similarly to weighted Boolean similarities matrix computation above we can compute a similarity matrix using the topics representations. Note that an additional normalization steps is required.
Note the differences with the weighted Boolean similarity matrix in the previous sub-section – the similarities that are less than 1 are noticeably larger.
Unit tests
The development of LSAMon was done with two types of unit tests: (i) directly specified tests, [AAp7], and (ii) tests based on randomly generated pipelines, [AA8].
The unit test package should be further extended in order to provide better coverage of the functionalities and illustrate – and postulate – pipeline behavior.
Since the monad LSAMon is a DSL it is natural to test it with a large number of randomly generated “sentences” of that DSL. For the LSAMon DSL the sentences are LSAMon pipelines. The package “MonadicLatentSemanticAnalysisRandomPipelinesUnitTests.m”, [AAp9], has functions for generation of LSAMon random pipelines and running them as verification tests. A short example follows.
AbsoluteTiming[
res = TestRunLSAMonPipelines[pipelines, "Echo" -> False];
]
From the test report results we see that a dozen tests failed with messages, all of the rest passed.
rpTRObj = TestReport[res]
(The message failures, of course, have to be examined – some bugs were found in that way. Currently the actual test messages are expected.)
Future plans
Dimension reduction extensions
It would be nice to extend the Dimension reduction functionalities of LSAMon to include other algorithms like Independent Component Analysis (ICA), [Wk5]. Ideally with LSAMon we can do comparisons between SVD, NNMF, and ICA like the image de-nosing based comparison explained in [AA8].
Another direction is to utilize Neural Networks for the topic extraction and making of statistical thesauri.
Conversational agent
Since LSAMon is a DSL it can be relatively easily interfaced with a natural language interface.
Here is an example of natural language commands parsed into LSA code using the package [AAp13].
Implementation notes
The implementation methodology of the LSAMon monad packages [AAp3, AAp9] followed the methodology created for the ClCon monad package [AAp10, AA6]. Similarly, this document closely follows the structure and exposition of the `ClCon monad document “A monad for classification workflows”, [AA6].
A lot of the functionalities and signatures of LSAMon were designed and programed through considerations of natural language commands specifications given to a specialized conversational agent.
This document shows a way to chart in Mathematica / WL the evolution of topics in collections of texts. The making of this document (and related code) is primarily motivated by the fascinating concept of the Great Conversation, [Wk1, MA1]. In brief, all western civilization books are based on great ideas; if we find the great ideas each significant book is based on we can construct a time-line (spanning centuries) of the great conversation between the authors; see [MA1, MA2, MA3].
The presented computational results are based on the text collections of State of the Union speeches of USA presidents [D2]. The code in this document can be easily configured to use the much smaller text collection [D1] available online and in Mathematica/WL. (The collection [D1] is fairly small, documents; the collection [D2] is much larger, documents.)
The procedures (and code) described in this document, of course, work on other types of text collections. For example: movie reviews, podcasts, editorial articles of a magazine, etc.
A secondary objective of this document is to illustrate the use of the monadic programming pipeline as a Software design pattern, [AA3]. In order to make the code concise in this document I wrote the package MonadicLatentSemanticAnalysis.m, [AAp5]. Compare with the code given in [AA1].
The very first version of this document was written for the 2017 summer course “Data Science for the Humanities” at the University of Oxford, UK.
Outline of the procedure applied
The procedure described in this document has the following steps.
Get a collection of documents with known dates of publishing.
Or other types of tags associated with the documents.
Do preliminary analysis of the document collection.
Number of documents; number of unique words.
Number of words per document; number of documents per word.
(Some of the statistics of this step are done easier after the Linear vector space representation step.)
Optionally perform Natural Language Processing (NLP) tasks.
In this section we load a text collection from a specified source.
The text collection from “Presidential Nomination Acceptance Speeches”, [D1], is small and can be used for multiple code verifications and re-runnings. The “State of Union addresses of USA presidents” text collection from [D2] was converted to a Mathematica/WL object by Christopher Wolfram (and sent to me in a private communication.) The text collection [D2] provides far more interesting results (and they are shown below.)
If[True,
speeches = ResourceData[ResourceObject["Presidential Nomination Acceptance Speeches"]];
names = StringSplit[Normal[speeches[[All, "Person"]]][[All, 2]], "::"][[All, 1]],
(*ELSE*)
(*State of the union addresses provided by Christopher Wolfram. *)
Get["~/MathFiles/Digital humanities/Presidential speeches/speeches.mx"];
names = Normal[speeches[[All, "Name"]]];
];
dates = Normal[speeches[[All, "Date"]]];
texts = Normal[speeches[[All, "Text"]]];
Dimensions[speeches]
(* {2453, 4} *)
Basic statistics for the texts
Using different contingency matrices we can derive basic statistical information about the document collection. (The document-word matrix is a contingency matrix.)
We can complete this list with additional stop words derived from the collection itself. (Not done here.)
Linear vector space representation and dimension reduction
Remark: In the rest of the document we use “term” to mean “word” or “stemmed word”.
The following code makes a document-term matrix from the document collection, exaggerates the representations of the terms using “TF-IDF”, and then does topic extraction through dimension reduction. The dimension reduction is done with NNMF; see [AAp3, AA1, AA2].
Let us clarify the values by briefly describing the computational steps.
From texts we derive the document-term matrix , where is the number of documents and is the number of terms.
The terms are words or stemmed words.
This is done with LSAMonMakeDocumentTermMatrix.
From docTermMat is derived the (weighted) matrix wDocTermMat using “TF-IDF”.
This is done with LSAMonApplyTermWeightFunctions.
Using docTermMat we find the terms that are present in sufficiently large number of documents and their column indices are assigned to topicColumnPositions.
Matrix factorization.
Assign to , , where .
Compute using NNMF the factorization , where , , and is the number of topics.
The values for the keys “W, “H”, and “topicColumnPositions” are computed and assigned by LSAMonTopicExtraction.
From the top terms of each topic are derived automatic topic names and assigned to the key automaticTopicNames in the monad context.
Also done by LSAMonTopicExtraction.
Statistical thesaurus
At this point in the object mObj we have the factors of NNMF. Using those factors we can find a statistical thesaurus for a given set of words. The following code calculates such a thesaurus, and echoes it in a tabulated form.
By observing the thesaurus entries we can see that the words in each entry are semantically related.
Note, that the word “welfare” strongly associates with “[applause]”. The rest of the query words do not, which can be seen by examining larger thesaurus entries:
The function LSAMonTopicsRepresentation finds the top outliers for each row of NNMF’s left factor . (The outliers are found using the package [AAp4].) The obtained list of indices gives the topic representation of the collection of texts.
Further we can see that if the documents have tags associated with them — like author names or dates — we can make a contingency matrix of tags vs topics. (See [AAp8, AA4].) This is also done by the function LSAMonTopicsRepresentation that takes tags as an argument. If the tags argument is Automatic, then the tags are simply the document indices.
Note that the matrix plots above are very close to the charting of the Great conversation that we are looking for. This can be made more obvious by observing the row names and columns names in the tabulation of the transposed matrix rsmat:
Magnify[#, 0.6] &@MatrixForm[Transpose[rsmat]]
Charting the great conversation
In this section we show several ways to chart the Great Conversation in the collection of speeches.
There are several possible ways to make the chart: using a time-line plot, using heat-map plot, and using appropriate tabulation (with MatrixForm or Grid).
In order to make the code in this section more concise the package RSparseMatrix.m, [AAp7, AA5], is used.
Topic name to topic words
This command makes an Association between the topic names and the top topic words.
Note the value of the option DistanceFunction: there is not re-ordering of the rows and columns are reordered by sorting. Also, the topics on the horizontal names have tool-tips.
[MA1] Mortimer Adler, "The Great Conversation Revisited," in The Great Conversation: A Peoples Guide to Great Books of the Western World, Encyclopædia Britannica, Inc., Chicago,1990, p. 28.
This document discusses concrete algorithms for two different approaches of generation of mandala images, [1]: direct construction with graphics primitives, and use of machine learning algorithms.
to show some pretty images exploiting symmetry and multiplicity (see this album),
to provide an illustrative example of comparing dimension reduction methods,
to give a set-up for further discussions and investigations on mandala creation with machine learning algorithms.
Two direct construction algorithms are given: one uses "seed" segment rotations, the other superimposing of layers of different types. The following plots show the order in which different mandala parts are created with each of the algorithms.
In this document we use several algorithms for dimension reduction applied to collections of images following the procedure described in [4,5]. We are going to show that with Non-Negative Matrix Factorization (NNMF) we can use mandalas made with the seed segment rotation algorithm to extract layer types and superimpose them to make colored mandalas. Using the same approach with Singular Value Decomposition (SVD) or Independent Component Analysis (ICA) does not produce good layers and the superimposition produces more "watered-down", less diverse mandalas.
From a more general perspective this document compares the statistical approach of "trying to see without looking" with the "direct simulation" approach. Another perspective is the creation of "design spaces"; see [6].
The idea of using machine learning algorithms is appealing because there is no need to make the mental effort of understanding, discerning, approximating, and programming the principles of mandala creation. We can "just" use a large collection of mandala images and generate new ones using the "internal knowledge" data of machine learning algorithms. For example, a Neural network system like Deep Dream, [2], might be made to dream of mandalas.
Direct algorithms for mandala generation
In this section we present two different algorithms for generating mandalas. The first sees a mandala as being generated by rotation of a "seed" segment. The second sees a mandala as being generated by different component layers. For other approaches see [3].
The request of [3] is for generation of mandalas for coloring by hand. That is why the mandala generation algorithms are in the grayscale space. Coloring the generated mandala images is a secondary task.
By seed segment rotations
One way to come up with mandalas is to generate a segment and then by appropriate number of rotations to produce a mandala.
Here is a function and an example of random segment (seed) generation:
Here is a more concise way to generate symmetric segment mandalas:
Multicolumn[Table[Image@MakeMandala[], {12}], 5]
Note that with this approach the programming of the mandala coloring is not that trivial — weighted blending of colorized mandalas is the easiest thing to do. (Shown below.)
"For this one I’ve defined three types of layer, a flower, a simple circle and a ring of small circles. You could add more for greater variety."
The coloring approach with image blending given below did not work well for this algorithm, so I modified the original code in order to produce colored mandalas.
The most interesting results are obtained with the image blending procedure coded below over mandala images generated with the seed segment rotation algorithm.
In this section we are going to apply the dimension reduction algorithms Singular Value Decomposition (SVD), Independent Component Analysis (ICA), and Non-Negative Matrix Factorization (NNMF) to a linear vector space representation (a matrix) of an image dataset. In the next section we use the bases generated by those algorithms to make mandala images.
We are going to use the packages [7,8] for ICA and NNMF respectively.
The linear vector space representation of the images is simple — each image is flattened to a vector (row-wise), and the image vectors are put into a matrix.
The SVD basis has an average mandala image as its first vector and the other vectors are "differences" to be added to that first vector.
The SVD and ICA bases are structured similarly. That is because ICA and SVD are both based on orthogonality — ICA factorization uses an orthogonality criteria based on Gaussian noise properties (which is more relaxed than SVD’s standard orthogonality criteria.)
As expected, the NNMF basis images have black background because of the enforced non-negativity. (Black corresponds to 0, white to 1.)
Compared to the SVD and ICA bases the images of the NNMF basis are structured in a radial manner. This can be demonstrated using image binarization.
We can see that binarizing of the NNMF basis images shows them as mandala layers. In other words, using NNMF we can convert the mandalas of the seed segment rotation algorithm into mandalas generated by an algorithm that superimposes layers of different types.
Blending with image bases samples
In this section we just show different blending images using the SVD, ICA, and NNMF bases.
What would be the outcomes of the above procedures to mandala images found in the World Wide Web (WWW) ?
Those WWW images are most likely man made or curated.
The short answer is that the results are not that good. Better results might be obtained using a larger set of WWW images (than just 100 in the experiment results shown below.)
Here is a sample from the WWW mandala images:
Here are the results obtained with NNMF basis:
Future plans
My other motivation for writing this document is to set up a basis for further investigations and discussions on the following topics.
Having a large image database of "real world", human made mandalas.
Utilization of Neural Network algorithms to mandala creation.
Utilization of Cellular Automata to mandala generation.
Investigate mandala morphing and animations.
Making a domain specific language of specifications for mandala creation and modification.
The idea of using machine learning algorithms for mandala image generation was further supported by an image classifier that recognizes fairly well (suitably normalized) mandala images obtained in different ways:
In this document are given outlines and examples of several related implementations of Lebesgue integration, [1], within the framework of NIntegrate, [7]. The focus is on the implementations of Lebesgue integration algorithms that have multiple options and can be easily extended (in order to do further research, optimization, etc.) In terms of NIntegrate‘s framework terminology it is shown how to implement an integration strategy or integration rule based on the theory of the Lebesgue integral. The full implementation of those strategy and rules — LebesgueIntegration, LebesgueIntegrationRule, and GridLebesgueIntegrationRule — are given in the Mathematica package [5].
The advantage of using NIntegrate‘s framework is that a host of supporting algorithms can be employed for preprocessing, execution, experimentation, and testing (correctness, comparison, and profiling.)
Here is a brief description of the integration strategy LebesgueIntegration in [5]:
prepare a function that calculates measure estimates based on random points or low discrepancy sequences of points in the integration domain;
use NIntegrate for the computation of one dimensional integrals for that measure estimate function over the range of the integrand function values.
The strategy is adaptive because of the second step — NIntegrate uses adaptive integration algorithms.
Instead of using an integration strategy we can "tuck in" the whole Lebesgue integration process into an integration rule, and then use that integration rule with the adaptive integration algorithms NIntegrate already has. This is done with the implementations of the integration rules LebesgueIntegrationRule and GridLebesgueIntegrationRule.
Brief theory
Lebesgue integration extends the definition of integral to a much larger class of functions than the class of Riemann integrable functions. The Riemann integral is constructed by partitioning the integrand’s domain (on the axis). The Lebesgue integral is constructed by partitioning the integrand’s co-domain (on the axis). For each value in the co-domain, find the measure of the corresponding set of points in the domain. Roughly speaking, the Lebesgue integral is then the sum of all the products ; see [1]. For our implementation purposes is defined differently, and in the rest of this section we follow [3].
Consider the non-negative bound-able measurable function :
We denote by the measure for the points in for which , i.e.
The Lebesgue integral of over can be be defined as:
Further, we can write the last formula as
The restriction can be handled by defining the following functions and :
and using the formula
Since finding analytical expressions of is hard we are going to look into ways of approximating .
For more details see [1,2,3,4].
(Note, that the theoretical outline the algorithms considered can be seen as algorithms that reduce multidimensional integration to one dimensional integration.)
Algorithm walk through
We can see that because of Equation (4) we mostly have to focus on estimating the measure function . This section provides a walk through with visual examples of a couple of stochastic ways to do that.
Consider the integral
In order to estimate in for we are going to generate in a set of low discrepancy sequence of points, [2]. Here this is done with points of the so called "Sobol" sequence:
To each point let us assign a corresponding "volume" that can be used to approximate with Equation (2). We can of course easily assign such volumes to be , but as it can be seen on the plot this would be a good approximation for a larger number of points. Here is an example of a different volume assignment using a Voronoi diagram, [10]:
In order to implement the outlined algorithm so it will be more universal we have to consider volumes rescaling, function positivity, and Voronoi diagram implementation(s). For details how these considerations are resolved see the code of the strategy LebesgueIntegration in [5].
Further algorithm elaborations
Measure estimate with regular grid cells
The article 3 and book 4 suggest the measure estimation to be done through membership of regular grid of cells. For example, the points generated in the previous section can be grouped by a grid:
The following steps describe in detail an algorithm based on the proposed in [3,4] measure estimation method. The algorithm is implemented in [5] for the symbol, GridLebesgueIntegrationRule.
1. Generate points filling the where is the dimension of .
2. Partition with a regular grid according to specifications.
2.1. Assume the cells are indexed with the integers , .
2.2. Assume that all cells have the same volume below.
3. For each point find to which cell of the regular grid it belongs to.
4. For each cell have a list of indices corresponding to the points that belong to it.
5. For a given sub-region of integration rescale to the points to lie within ; denote those points with .
5.1. Compute the rescaling factor for the integration rule; denote with .
6. For a given integrand function evaluate over all points .
7. For each cell find the min and max values of .
7.1. Denote with and correspondingly.
8. For a given value , where is some integer enumerating the 1D integration rule sampling points, find the coefficients , using the following formula:
9. Find the measure estimate of with
Axis splitting in Lebesgue integration rules
The implementations of Lebesgue integration rules are required to provide a splitting axis for use of the adaptive algorithms. Of course we can assign a random splitting axis, but that might lead often to slower computations. One way to provide splitting axis is to choose the axis that minimizes the sum of the variances of sub-divided regions estimated by samples of the rule points. This is the same approach taken in NIntegrate`s rule "MonteCarloRule"; for theoretical details see the chapter "7.8 Adaptive and Recursive Monte Carlo Methods" of [11].
In [5] this splitting axis choice is implemented based on integrand function values. Another approach, more in the spirit of the Lebesgue integration, is to select the splitting axis based on variances of the measure function estimates.
The strategy LebesgueIntegration uses an internal variable for the calculation of the Lebesgue integral. In "EvaluationMonitor" either that variable has to be used, or a symbol name has to be passed though the option "LebesgueIntegralVariableSymbol". Here is an example:
We can use NIntegrate‘s utility functions for visualization and profiling in order to do comparison of the implemented algorithms with related ones (like "AdaptiveMonteCarlo") which NIntegrate has (or are plugged-in).
The flow charts below show that the plug-in designs have common elements. In order to make the computations more effective the rule initialization prepares the data that is used in all rule invocations. For the strategy the initialization can be much lighter since the strategy algorithm is executed only once.
In the flow charts the double line boxes designate sub-routines. We can see that so called Hollywood principle "don’t call us, we’ll call you" in Object-oriented programming is manifested.
The following flow chart shows the steps of NIntegrate‘s execution when the integration strategy LebesgueIntegration is used.
The following flow chart shows the steps of NIntegrate‘s execution when the integration rule LebesgueIntegrationRule is used.
Algorithms versions and options
There are multiple architectural, coding, and interface decisions to make while doing implementations like the ones in [5] and described in this document. The following mind map provides an overview of alternatives and interactions between components and parameters.
Alternative implementations
In many ways using the Lebesgue integration rule with the adaptive algorithms is similar to using NIntegrate‘s "AdaptiveMonteCarlo" and its rule "MonteCarloRule". Although it is natural to consider plugging-in the Lebesgue integration rules into "AdaptiveMonteCarlo" at this point NIntegrate‘s framework does not allow "AdaptiveMonteCarlo" the use of a rule different than "MonteCarloRule".
We can consider using Monte Carlo algorithms for estimating the measures corresponding to a vector of values (that come from a 1D integration rule). This can be easily done, but it is not that effective because of the way NIntegrate handles vector integrands and because of stopping criteria issues when the measures are close to .
Future plans
One of the most interesting extensions of the described Lebesgue integration algorithms and implementation designs is their extension with more advanced features of Mathematica for geometric computation. (Like the functions VoronoiMesh, RegionMeasure, and ImplicitRegion used above.)
Another interesting direction is the derivation and use of symbolic expressions for the measure functions. (Hybrid symbolic and numerical algorithms can be obtained as NIntegrate‘s handling of piecewise functions or the strategy combining symbolic and numerical integration described in [9].)
In a previous blog post, [1], I compared Principal Component Analysis (PCA) / Singular Value Decomposition (SVD) and Non-Negative Matrix Factorization (NNMF) over a collection of noised images of digit handwriting from the MNIST data set, [3], which is available in Mathematica.
This blog post adds to that comparison the use of Independent Component Analysis (ICA) proclaimed in my previous blog post, [1].
Computations
The ICA related additional computations to those in [1] follow.
Independent Component Analysis (ICA) is a (matrix factorization) method for separation of a multi-dimensional signal (represented with a matrix) into a weighted sum of sub-components that have less entropy than the original variables of the signal. See [1,2] for introduction to ICA and more details.
This blog post is to proclaim the implementation of the "FastICA" algorithm in the package IndependentComponentAnalysis.m and show a basic application with it. (I programmed that package last weekend. It has been in my ToDo list to start ICA algorithms implementations for several months… An interesting offshoot was the procedure I derived for the StackExchange question "Extracting signal from Gaussian noise".)
In this blog post ICA is going to be demonstrated with both generated data and "real life" weather data (temperatures of three cities within one month).
Generated data
In order to demonstrate ICA let us make up some data in the spirit of the "cocktail party problem".
It is important to note that the usual ICA model interpretation for the factorized matrix X is that each column is a variable (audio signal) and each row is an observation (recordings of the microphones at a given time). The matrix 3×1201 M was constructed with the interpretation that each row is a signal, hence we have to transpose M in order to apply the ICA algorithms, X=M^T.
X = Transpose[M];
{S, A} = IndependentComponentAnalysis[X, 3];
Check the approximation of the obtained factorization:
Norm[X - S.A]
(* 3.10715*10^-14 *)
Plot the found source signals:
Grid[{Map[ListLinePlot[#, PlotRange -> All, ImageSize -> 250] &,
Transpose[S]]}]
Because of the random initialization of the inverting matrix in the algorithm the result my vary. Here is the plot from another run:
The package also provides the function FastICA that returns an association with elements that correspond to the result of the function fastICA provided by the R package "fastICA". See [4].
Note that (in adherence to [4]) the function FastICA returns the matrices S and A for the centralized matrix X. This means, for example, that in order to check the approximation proper mean has to be supplied:
The result of the function IndependentComponentAnalysis is a list of two matrices. The result of FastICA is an association of the matrices obtained by ICA. The function IndependentComponentAnalysis takes a method option and options for precision goal and maximum number of steps:
The intent is IndependentComponentAnalysis to be the front interface to different ICA algorithms. (Hence, it has a Method option.) The function FastICA takes as options the named arguments of the R function fastICA described in [4].
At this point FastICA has only the deflation algorithm described in [1]. ([4] has also the so called "symmetric" ICA sub-algorithm.) The R function fastICA in [4] can use only two neg-Entropy functions log(cosh(x)) and exp(-u^2/2). Because of the symbolic capabilities of MathematicaFastICA of [3] can take any listable function through the option "NonGaussianityFunction", and it will find and use the corresponding first and second derivatives.
Using NNMF for ICA
It seems that in some cases, like the generated data used in this blog post, Non-Negative Matrix Factorization (NNMF) can be applied for doing ICA.
To be clear, NNMF does dimension reduction, but its norm minimization process does not enforce variable independence. (It enforces non-negativity.) There are at least several articles discussing modification of NNMF to do ICA. For example [6].
For the generated data in this blog post, FactICA is much faster than NNMF and produces better separation of the signals with every run. The data though is a typical representative for the problems ICA is made for. Another comparison with image de-noising, extending my previous blog post, will be published shortly.
ICA for mixed time series of city temperatures
Using Mathematica‘s function WeatherData we can get temperature time series for a small set of cities over a certain time grid. We can mix those time series into a multi-dimensional signal, MS, apply ICA to MS, and judge the extracted source signals with the original ones.