Introduction
This MathematicaVsR at GitHub project is for comparing Mathematica and R for the tasks of classifier creation, execution, and evaluation using the MNIST database of images of handwritten digits.
Here are the bases built with two different classifiers:
 Singular Value Decomposition (SVD)
 NonNegative Matrix Factorization (NNMF)
Here are the confusion matrices of the two classifiers:
The blog post "Classification of handwritten digits" (published 2013) has a related more elaborated discussion over a much smaller database of handwritten digits.
Concrete steps
The concrete steps taken in scripts and documents of this project follow.
 Ingest the binary data files into arrays that can be visualized as digit images.
 We have two sets: 60,000 training images and 10,000 testing images.
 Make a linear vector space representation of the images by simple unfolding.

For each digit find the corresponding representation matrix and factorize it.

Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)
 One of the matrix factors is seen as a new basis.

For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.

Evaluate the classifier(s) over all test images and compute accuracy, FScores, and other measures.
Scripts
There are scripts going through the steps listed above:
Documents
The following documents give expositions that are suitable for reading and following of steps and corresponding results.
Observations
Ingestion
I figured out first in R how to ingest the data in the binary files of the MNIST database. There were at least several online resources (blog posts, GitHub repositories) that discuss the MNIST binary files ingestion.
After that making the corresponding code in Mathematica was easy.
Classification results
Same in Mathematica and R for for SVD and NNMF. (As expected.)
NNMF
NNMF classifiers use the MathematicaForPrediction at GitHub implementations: NonNegativeMatrixFactorization.m and NonNegativeMatrixFactorization.R.
Parallel computations
Both Mathematica and R have relatively simple setup of parallel computations.
Graphics
It was not very straightforward to come up in R with visualizations for MNIST images. The Mathematica visualization is much more flexible when it comes to plot labeling.
Going further
Comparison with other classifiers
Using Mathematica’s builtin classifiers it was easy to compare the SVD and NNMF classifiers with neural network ones and others. (The SVD and NNMF are much faster to built and they bring comparable precision.)
It would be nice to repeat that in R using one or several of the neural network classifiers provided by Google, Microsoft, H2O, Baidu, etc.
Classifier ensembles
Another possible extension is to use classifier ensembles and Receiver Operation Characteristic (ROC) to create better classifiers. (Both in Mathematica and R.)
Importance of variables
Using classifier agnostic importance of variables procedure we can figure out :

which NNMF basis vectors (images) are most important for the classification precision,

which image rows or columns are most important for each digit, or similarly

which image squares of a, say, 4×4 image grid are most important.