RSparseMatrix for sparse matrices with named rows and columns

Introduction

In the last few years I have used a lot R’s base library Matrix that has implementation of sparse matrix objects and efficient computations. To the sparse matrices from R’s Matrix library one can assign and retrieve row names and column names with the functions colnames and rownames. Sometimes I miss this in Mathematica so I started a Mathematica package that implements similar functionalities. The package is named RSparseMatrix.m has purely Mathematica language implementations (i.e. it does not use RLink ). It can be loaded/downloaded from MathematicaForPrediction at GitHub:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Misc/RSparseMatrix.m"]

The package provides functions to create and do operations over  RSparseMatrix objects of that are basically SparseArray objects with row and column names. A major design decision is to restrict these functionalities to two dimensional sparse arrays and lists of strings as row and column names. (Note that the package is not finished and in some functions the row and column names are ignored.)

The package attempts to cover as many as possible of the functionalities for sparse matrix objects that are provided by R’s Matrix library. (Sub-matrix extraction by row and column names, row and column names propagation for dot products, row and column binding sparse matrices, row and column sums, etc.) This document has examples and tests for RSparseMatrix.m .

My participation in WTC 2015 with a talk about Mathematica and R comparison was one the main motivators to write this blog post. Another is this Mathematica StackExchange discussion. (And a third one is seeing tonight the impressive movie “The Martian” — such a display of the triumph of the humans over space and nature using technology and science in a creative way made me wanna discuss how to make some programming objects more convenient.)

Basic examples

Creation

rmat = MakeRSparseMatrix[
 {{1, 1} -> 1, {2, 2} -> 2, {4, 3} -> 3, {1, 4} -> 4, {3, 5} -> 2},
 "ColumnNames" -> {"a", "b", "c", "d", "e"},
 "RowNames" -> {"A", "B", "C", "D"},
 "DimensionNames" -> {"U", "V"}]

The function MatrixForm shows the RSparseMatrix objects with their row and column names:

rmat // MatrixForm
rmat-MatrixForm

The RSparseMatrix objects can be created from SparseArray objects:

Query functions

These functions can be used to retrieve the names of rows, columns, and dimensions. They correspond to R’s functions rownames, colnames, dimnames.

In[154]:= RowNames[rmat]
Out[154]= {"A", "B", "C", "D"}
In[155]:= ColumnNames[rmat]
Out[155]= {"a", "b", "c", "d", "e"}
In[156]:= DimensionNames[rmat]
Out[156]= {"U", "V"}

Functions that work on SparseArray

Of course since RSparseMatrix is based on SparseArray we would expect the functions that work on SparseArray objects to work RSpaseMatrix objects too. E.g. Dimensions, ArrayRules, Transpose, Total, and others.

In[157]:= Dimensions[rmat]
Out[157]= {4, 5}
In[158]:= ArrayRules[rmat]
Out[158]= {{1, 1} -> 1, {1, 4} -> 4, {2, 2} -> 2, {3, 5} -> 2, {4, 3} -> 3, {_, _} -> 0}

Dot product

Row names and column names are respected for dot products if that leads to meaningful assignments. The examples below demonstrate a general principle:

When a matrix operation can be performed on the underlying sparse arrays but the row names or column names do not coincide the names are dropped.

In the tables with examples below the last rows show the heads of the results.

Matrix by vector

RSparseMartix-Matrix-by-vector-examples-grid

Matrix by matrix

RSparseMartix-Matrix-by-matrix-SA-examples-grid

RSparseMartix-Matrix-by-matrix-examples-grid

Part

A major useful feature is to have Part work with row and column names. The implementation of that additional functionality for Part is demonstrated below.

In the cases when the dimension drops sparse arrays or numbers are returned. In R the operation “[” has the parameter “drop” — the expression “smat[1,,drop=F]” is going to be a sparse matrix, the expression “smat[1,,drop=T]” is going to be a dense vector. The corresponding implementation is to have the option “Drop->True|False” for Part, but that does not seem a good idea. And we can easily emulate the “drop” option in R using “{_?AtomQ}” inside Part.

RSparseMartix-Part-scenarios-examples-grid

Neat example

Consider this incidence matrix that represents a bi-partite graph of relationships of actors starring in movies:

Bi-partite-matrix-for-Movies-Actors-graph

We can use a RSparseMatrix object of it with named rows and columns (rBiMat).

Here is the corresponding graph:

Movies-Actors-graph

If we want to see which actors have participated in movies together with Orlando Bloom we can do the following:

Actors-starring-with-Orlando-Bloom

Advertisements