Enhancements of MosaicPlot

I made the following enhancements of the function MosaicPlot which I described (and proclaimed the implementation of) in my previous blog post:

1. Tooltips with precise contingency statistics.
2. If the last data column is numerical then MosaicPlot can use it as pre-computed contingency statistics.
3. Coloring of the rectangles according to a list of index->color rules.

The document “Mosaic plots for data visualization” hosted at MathematicaForPrediction at GitHub, combines the information of this blog post and previous one. The document also has Mathematica code examples of usage and description of MosaicPlot‘s options.

Tooltips with precise contingency statistics

I already proclaimed in my previous blog post the tooltips functionality — when hovering with the mouse over the rectangles then MosaicPlot, using Tooltip, gives a table with the exact co-occurrence (contingency) values. Here is an example:
Adult census income data sex-education-income colored mosaic plot with tooltips

Visualizing categorical columns + a numerical column

If the last data column is numerical then MosaicPlot can use it as pre-computed contingency statistics. This functionality is specified with the option “ExpandLastColumn”->True.

In order to explain the functionality we are going to use following interpretation. If the last of column of the data is numerical then we can treat the data as a contracted version of a longer list of records made only of the categorical columns. For example, consider the following table with observations of people’s hair and eyes color:
Hair and eyes color number of observations

The table above can be considered as a contracted version of this table:
Hair and eyes color observations

Setting the option “ExpandLastColumn” to True gives a mosaic plot corresponding to that latter, observations-expanded table:
Hair and eyes color mosaic plot

The last data column (which is numerical) does not need to be made of integers:
Hair and eyes color mosaic plot Mathematica code

Rectangle coloring

The rectangles can be colored using the option ColorRules which specifies how the colors of the rectangles are determined from the indices of the data columns.

More precisely, the values of the option ColorRules should be a list of rules, {i1->c1,i2->c2,…}, matching the form

{(_Integer->(_RGBColor|_GrayLevel))..} .

If coloring for only one column index is specified the value of ColorRules can be of the form

{_Integer->{(_RGBColor|_GrayLevel)..}} .

The colors are used with Blend in order to color the rectangles according to the order of the unique values of the specified data columns.

The default value for ColorRules is Automatic. When Automatic is given to ColorRules, MosaicPlot finds the data column with the largest number of unique values and colors them according to their order using ColorData[7,"ColorList"].

The grid of plots below shows mosaic plots of the same data with different values for the option ColorRules (given as plot labels).
Grid of mosaic plots for ColorRules values


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s