# GDP per capita analysis

Let us assume that using demographic and economic data of all countries we can find correlations with high GDP per capita. In this blog post with “high GDP per capita” I mean “GDP per capita larger than \$30,000.”

I used decision trees for these correlation experiments, and more specifically the implementation at the “MathematicaForPrediction” project at GitHub (https://github.com/antononcube/MathematicaForPrediction).

I built several decision trees and forests. Here is a sample of the training data:

And here it can be seen how it was labeled:

We have 176 countries labeled “low” (meaning with low GDP per capita) and 40 countries labeled “high”. (I used Mathematicas CountryData function.)

A great feature of decision trees is that they are easy to interpret — here is  a decision tree over the data discussed above:

Each non-leaf node of the tree has the format {impurity, splitting value, splitting variable, variable type, number of rows}. The leaf nodes are numbered, each leaf shows a label and how many countries adhere to predicate formed from the root to the leaf.

Following the edges from the root of the tree to Leaf 18 we can see that countries that have life expectancy higher that 79, birth rate fraction less than 0.014, median age higher than 33, and literacy fraction higher than 0.94 are with high GDP per capita. This rule holds for more than 50% of the countries with high GDP per capita.

Following the edges from the root of the tree to Leaf 0 we can see that countries that have life expectancy less than 74 and median age less than 33 are with low GDP per capita. This rule holds for more than 65% of the countries with low GDP per capita.

I made decision trees with more comprehensive sets of variables.  Here is a sample of the training data:

And here is the resulting decision tree: