Handwritten digits recognition by matrix factorization

Introduction

This MathematicaVsR at GitHub project is for comparing Mathematica and R for the tasks of classifier creation, execution, and evaluation using the MNIST database of images of handwritten digits.

Here are the bases built with two different classifiers:

  • Singular Value Decomposition (SVD)

SVD-basis-for-5

  • Non-Negative Matrix Factorization (NNMF)

NNMF-basis-for-5

Here are the confusion matrices of the two classifiers:

  • SVD

SVD-confusion-matrix

  • NNMF

NNMF-confusion-matrix

The blog post "Classification of handwritten digits" (published 2013) has a related more elaborated discussion over a much smaller database of handwritten digits.

Concrete steps

The concrete steps taken in scripts and documents of this project follow.

  1. Ingest the binary data files into arrays that can be visualized as digit images.
  • We have two sets: 60,000 training images and 10,000 testing images.
  1. Make a linear vector space representation of the images by simple unfolding.

  2. For each digit find the corresponding representation matrix and factorize it.

  3. Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)

  • One of the matrix factors is seen as a new basis.
  1. For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.

  2. Evaluate the classifier(s) over all test images and compute accuracy, F-Scores, and other measures.

Scripts

There are scripts going through the steps listed above:

Documents

The following documents give expositions that are suitable for reading and following of steps and corresponding results.

Observations

Ingestion

I figured out first in R how to ingest the data in the binary files of the MNIST database. There were at least several online resources (blog posts, GitHub repositories) that discuss the MNIST binary files ingestion.

After that making the corresponding code in Mathematica was easy.

Classification results

Same in Mathematica and R for for SVD and NNMF. (As expected.)

NNMF

NNMF classifiers use the MathematicaForPrediction at GitHub implementations: NonNegativeMatrixFactorization.m and NonNegativeMatrixFactorization.R.

Parallel computations

Both Mathematica and R have relatively simple set-up of parallel computations.

Graphics

It was not very straightforward to come up in R with visualizations for MNIST images. The Mathematica visualization is much more flexible when it comes to plot labeling.

Going further

Comparison with other classifiers

Using Mathematica’s built-in classifiers it was easy to compare the SVD and NNMF classifiers with neural network ones and others. (The SVD and NNMF are much faster to built and they bring comparable precision.)

It would be nice to repeat that in R using one or several of the neural network classifiers provided by Google, Microsoft, H2O, Baidu, etc.

Classifier ensembles

Another possible extension is to use classifier ensembles and Receiver Operation Characteristic (ROC) to create better classifiers. (Both in Mathematica and R.)

Importance of variables

Using classifier agnostic importance of variables procedure we can figure out :

  • which NNMF basis vectors (images) are most important for the classification precision,

  • which image rows or columns are most important for each digit, or similarly

  • which image squares of a, say, 4×4 image grid are most important.