Pareto principle adherence examples

This post (document) is made to provide examples of the Pareto principle manifestation in different datasets.

The Pareto principle is an interesting law that manifests in many contexts. It is also known as "Pareto law", "the law of significant few", "the 80-20 rule".

For example:

  • "80% of the land is owned by 20% of the population",

  • "10% of all lakes contain 90% of all lake water."

For extensive discussion and studied examples see the Wikipedia entry "Pareto principle", [4].

It is a good idea to see for which parts of the analyzed data the Pareto principle manifests. Testing for the Pareto principle is usually simple. For example, assume that we have the GDP of all countries:

countries = CountryData["Countries"];
gdps = {CountryData[#, "Name"], CountryData[#, "GDP"]} & /@ countries;
gdps = DeleteCases[gdps, {_, _Missing}] /. Quantity[x_, _] :> x;

Grid[{RecordsSummary[gdps, {"country", "GDP"}]}, Alignment -> Top, Dividers -> All]


In order to test for the manifestation of the Pareto principle we have to (i) sort the GDP values in descending order, (ii) find the cumulative sums, (iii) normalize the obtained vector by the sum of all values, and (iv) plot the result. These steps are done with the following two commands:

t = Reverse@Sort@gdps[[All, 2]];
ListPlot[Accumulate[t]/Total[t], PlotRange -> All, GridLines -> {{0.2} Length[t], {0.8}}, Frame -> True]


In this document we are going to use the special function ParetoLawPlot defined in the next section and the package [1]. Most of the examples use data that is internally accessible within Mathematica. Several external data examples are considered.

See the package [1] for the function RecordsSummary. See the source file [2] for R functions that facilitate the plotting of Pareto principle graphs. See the package [3] for the outlier detection functions used below.


This simple function makes a list plot that would help assessing the manifestation of the Pareto principle. It takes the same options as ListPlot.

Options[ParetoLawPlot] = Options[ListPlot];
ParetoLawPlot[dataVec : {_?NumberQ ..}, opts : OptionsPattern[]] := ParetoLawPlot[{Tooltip[dataVec, 1]}, opts];
ParetoLawPlot[dataVecs : {{_?NumberQ ..} ..}, opts : OptionsPattern[]] := ParetoLawPlot[MapThread[Tooltip, {dataVecs, Range[Length[dataVecs]]}], opts];
ParetoLawPlot[dataVecs : {Tooltip[{_?NumberQ ..}, _] ..}, opts : OptionsPattern[]] :=
  Block[{t, mc = 0.5},
   t = Map[Tooltip[(Accumulate[#]/Total[#] &)[SortBy[#[[1]], -# &]], #[[2]]] &, dataVecs];
   ListPlot[t, opts, PlotRange -> All, GridLines -> {Length[t[[1, 1]]] Range[0.1, mc, 0.1], {0.8}}, Frame -> True, FrameTicks -> {{Automatic, Automatic}, {Automatic, Table[{Length[t[[1, 1]]] c, ToString[Round[100 c]] <> "%"}, {c, Range[0.1, mc, 0.1]}]}}]

This function is useful for coloring the outliers in the list plots.

ColorPlotOutliers[] := # /. {Point[ps_] :> {Point[ps], Red, Point[ps[[OutlierPosition[ps[[All, 2]]]]]]}} &;
ColorPlotOutliers[oid_] := # /. {Point[ps_] :> {Point[ps], Red, Point[ps[[OutlierPosition[ps[[All, 2]], oid]]]]}} &;

These definitions can be also obtained by loading the packages MathematicaForPredictionUtilities.m and OutlierIdentifiers.m; see [1,3].



Below we are going to use the metric system of units. (If preferred we can easily switch to the imperial system.)

$UnitSystem = "Metric";(*"Imperial"*)


We are going to consider a typical Pareto principle example — weatlh of income distribution.


This code find the Gross Domestic Product (GDP) of different countries:

gdps = {CountryData[#, "Name"], CountryData[#, "GDP"]} & /@CountryData["Countries"];
gdps = DeleteCases[gdps, {_, _Missing}] /. Quantity[x_, _] :> x;

The corresponding Pareto plot (note the default grid lines) shows that 10% of countries have 90% of the wealth:

ParetoLawPlot[gdps[[All, 2]], ImageSize -> 400]


Here is the log histogram of the GDP values.

Histogram[Log10@gdps[[All, 2]], 20, PlotRange -> All]


The following code shows the log plot of countries GDP values and the found outliers.

 DynamicModule[{data = Transpose[{Range[Length[gdps]], Sort[gdps[[All, 2]]]}], pos},
  pos = OutlierPosition[modFunc@data[[All, 2]], tb@*opar];
  If[Length[pos] > 0,
   ListLogPlot[{data, data[[pos]]}, PlotRange -> All, PlotTheme -> "Detailed", FrameLabel -> {"Index", "GDP"}, PlotLegends -> SwatchLegend[{"All", "Outliers"}]],
   ListLogPlot[{data}, PlotRange -> All, PlotTheme -> "Detailed", FrameLabel -> {"Index", "GDP"}, PlotLegends -> SwatchLegend[{"All", "Outliers"}]]
 {{opar, SPLUSQuartileIdentifierParameters, "outliers detector"}, {HampelIdentifierParameters, SPLUSQuartileIdentifierParameters}},
 {{tb, TopOutliers, "bottom|top"}, {BottomOutliers, TopOutliers}},
 {{modFunc, Identity, "data modifier function"}, {Identity, Log}}


This table gives the values for countries with highest GDP.

Block[{data = gdps[[OutlierPosition[gdps[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
 Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]



Similar data retrieval and plots can be made for countries populations.

pops = {CountryData[#, "Name"], CountryData[#, "Population"]} & /@CountryData["Countries"];
unit = QuantityUnit[pops[[All, 2]]][[1]];
pops = DeleteCases[pops, {_, _Missing}] /. Quantity[x_, _] :> x;

In the following Pareto plot we can see that 15% of countries have 80% of the total population:

ParetoLawPlot[pops[[All, 2]], PlotLabel -> Row[{"Population", ", ", unit}]]


Here are the countries with most people:

Block[{data = pops[[OutlierPosition[pops[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
 Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]



We can also see that the Pareto principle holds for the countries areas:

areas = {CountryData[#, "Name"], CountryData[#, "Area"]} & /@CountryData["Countries"];
areas = DeleteCases[areas, {_, _Missing}] /. Quantity[x_, _] :> x;
ParetoLawPlot[areas[[All, 2]]]


Block[{data = areas[[OutlierPosition[areas[[All, 2]], TopOutliers@*SPLUSQuartileIdentifierParameters]]]},
 Row[Riffle[#, " "]] &@Map[Grid[#, Dividers -> All, Alignment -> {Left, "."}] &, Partition[SortBy[data, -#[[-1]] &], Floor[Length[data]/3]]]


Time series-wise

An interesting diagram is to plot together the curves of GDP changes for different countries. We can see China and Poland have had rapid growth.

res = Table[
    (t = CountryData[countryName, {{"GDP"}, {1970, 2015}}];
     t = Reverse@Sort[t["Path"][[All, 2]] /. Quantity[x_, _] :> x];
     Tooltip[t, countryName])
    , {countryName, {"USA", "China", "Poland", "Germany", "France", "Denmark"}}];

ParetoLawPlot[res, PlotRange -> All, Joined -> True, PlotLegends -> res[[All, 2]]]



This dynamic interface can be used for a given country to compare (i) the GDP evolution in time and (ii) the corresponding Pareto plot.

 DynamicModule[{ts, t},
  ts = CountryData[countryName, {{"GDP"}, {1970, 2015}}];
  t = Reverse@Sort[ts["Path"][[All, 2]] /. Quantity[x_, _] :> x];
  Grid[{{"Date list plot of GDP values", "GDP Pareto plot"}, {DateListPlot[ts, ImageSize -> Medium],
     ParetoLawPlot[t, ImageSize -> Medium]}}]
 ], {countryName, {"USA", "China", "Poland", "Germany", "France", "Denmark"}}]


Country flag colors

The following code demonstrates that the colors of the pixels in country flags also adhere to the Pareto principle.

flags = CountryData[#, "Flag"] & /@ CountryData["Countries"];

flags[[1 ;; 12]]


ids = ImageData /@ flags;

pixels = Apply[Join, Flatten[ids, 1]];

ToBinFunc[x_] := Evaluate[Piecewise[MapIndexed[{#2[[1]], #1[[1]] < x <= #1[[2]]} &, Partition[Range[0, 1, 0.1], 2, 1]]]];

pixelsInt = Transpose@Table[Map[ToBinFunc, pixels[[All, i]]], {i, 1, 3}];

pixelsIntTally = SortBy[Tally[pixelsInt], -#[[-1]] &];

ParetoLawPlot[pixelsIntTally[[All, 2]]]



Loking at lengths in the tunnel data we can see the manifestation of an exaggerated Pareto principle.

tunnelLengths = TunnelData[All, {"Name", "Length"}];
tunnelLengths // Length

(* 1552 *)

t = Reverse[Sort[DeleteMissing[tunnelLengths[[All, -1]]] /. Quantity[x_, _] :> x]];



Here is the logarithmic histogram of the lengths:

Histogram[Log10@t, PlotRange -> All, PlotTheme -> "Detailed"]



The following code gathers the data and make the Pareto plots surface areas, volumes, and fish catch values for lakes. We can that the lakes volumes show exaggerated Pareto principle.

lakeAreas = LakeData[All, "SurfaceArea"];
lakeVolumes = LakeData[All, "Volume"];
lakeFishCatch = LakeData[All, "CommercialFishCatch"];

data = {lakeAreas, lakeVolumes, lakeFishCatch};
t = N@Map[DeleteMissing, data] /. Quantity[x_, _] :> x;

opts = {PlotRange -> All, ImageSize -> Medium}; MapThread[ParetoLawPlot[#1, PlotLabel -> Row[{#2, ", ", #3}], opts] &, {t, {"Lake area", "Lake volume", "Commercial fish catch"}, DeleteMissing[#][[1, 2]] & /@ data}]


City data

One of the examples given in [5] is that the city areas obey the Power Law. Since the Pareto principle is a kind of Power Law we can confirm that observation using Pareto principle plots.

The following grid of Pareto principle plots is for areas and population sizes of cities in selected states of USA.

res = Table[
    (cities = CityData[{All, stateName, "USA"}];
     t = Transpose@Outer[CityData, cities, {"Area", "Population"}];
     t = Map[DeleteMissing[#] /. Quantity[x_, _] :> x &, t, {1}];
     ParetoLawPlot[MapThread[Tooltip, {t, {"Area", "Population"}}], PlotLabel -> stateName, ImageSize -> 250])
    , {stateName, {"Alabama", "California", "Florida", "Georgia", "Illinois", "Iowa", "Kentucky", "Ohio", "Tennessee"}}];

Legended[Grid[ArrayReshape[res, {3, 3}]], SwatchLegend[Cases[res[[1]], _RGBColor, Infinity], {"Area", "Population"}]]


Movie ratings in MovieLens datasets

Looking into the MovieLens 20M dataset, [6], we can see that the Pareto princple holds for (1) most rated movies and (2) most active users. We can also see the manifestation of an exaggerated Pareto law — 90% of all ratings are for 10% of the movies.



The following plot taken from the blog post "PIN analysis", [7], shows that the four digit passwords people use adhere to the Pareto principle: the first 20% of (the unique) most frequently used passwords correspond to the 70% of all passwords use.




[1] Anton Antonov, "MathematicaForPrediction utilities", (2014), source code MathematicaForPrediction at GitHub,, package MathematicaForPredictionUtilities.m.

[2] Anton Antonov, Pareto principle functions in R, source code MathematicaForPrediction at GitHub,, source code file ParetoLawFunctions.R .

[3] Anton Antonov, Implementation of one dimensional outlier identifying algorithms in Mathematica, (2013), MathematicaForPrediction at GitHub, URL: .

[4] Wikipedia entry, "Pareto principle", URL: .

[5] Wikipedia entry, "Power law", URL: .

[6] GroupLens Research, MovieLens 20M Dataset, (2015).

[7] "PIN analysis", (2012), DataGenetics.


Handwritten digits recognition by matrix factorization


This MathematicaVsR at GitHub project is for comparing Mathematica and R for the tasks of classifier creation, execution, and evaluation using the MNIST database of images of handwritten digits.

Here are the bases built with two different classifiers:

  • Singular Value Decomposition (SVD)


  • Non-Negative Matrix Factorization (NNMF)


Here are the confusion matrices of the two classifiers:

  • SVD


  • NNMF


The blog post "Classification of handwritten digits" (published 2013) has a related more elaborated discussion over a much smaller database of handwritten digits.

Concrete steps

The concrete steps taken in scripts and documents of this project follow.

  1. Ingest the binary data files into arrays that can be visualized as digit images.
  • We have two sets: 60,000 training images and 10,000 testing images.
  1. Make a linear vector space representation of the images by simple unfolding.

  2. For each digit find the corresponding representation matrix and factorize it.

  3. Store the matrix factorization results in a suitable data structure. (These results comprise the classifier training.)

  • One of the matrix factors is seen as a new basis.
  1. For a given test image (and its linear vector space representation) find the basis that approximates it best. The corresponding digit is the classifier prediction for the given test image.

  2. Evaluate the classifier(s) over all test images and compute accuracy, F-Scores, and other measures.


There are scripts going through the steps listed above:


The following documents give expositions that are suitable for reading and following of steps and corresponding results.



I figured out first in R how to ingest the data in the binary files of the MNIST database. There were at least several online resources (blog posts, GitHub repositories) that discuss the MNIST binary files ingestion.

After that making the corresponding code in Mathematica was easy.

Classification results

Same in Mathematica and R for for SVD and NNMF. (As expected.)


NNMF classifiers use the MathematicaForPrediction at GitHub implementations: NonNegativeMatrixFactorization.m and NonNegativeMatrixFactorization.R.

Parallel computations

Both Mathematica and R have relatively simple set-up of parallel computations.


It was not very straightforward to come up in R with visualizations for MNIST images. The Mathematica visualization is much more flexible when it comes to plot labeling.

Going further

Comparison with other classifiers

Using Mathematica’s built-in classifiers it was easy to compare the SVD and NNMF classifiers with neural network ones and others. (The SVD and NNMF are much faster to built and they bring comparable precision.)

It would be nice to repeat that in R using one or several of the neural network classifiers provided by Google, Microsoft, H2O, Baidu, etc.

Classifier ensembles

Another possible extension is to use classifier ensembles and Receiver Operation Characteristic (ROC) to create better classifiers. (Both in Mathematica and R.)

Importance of variables

Using classifier agnostic importance of variables procedure we can figure out :

  • which NNMF basis vectors (images) are most important for the classification precision,

  • which image rows or columns are most important for each digit, or similarly

  • which image squares of a, say, 4×4 image grid are most important.