Basic example of using ROC with Linear regression


This post is for using the package [2] that provides Mathematica implementations of Receiver Operating Characteristic (ROC) functions calculation and plotting. The ROC framework is used for analysis and tuning of binary classifiers, [3]. (The classifiers are assumed to classify into a positive/true label or a negative/false label. )

The function ROCFuntions gives access to the individual ROC functions through string arguments. Those ROC functions are applied to special objects, called ROC Association objects.

Each ROC Association object is an Association that has the following four keys: "TruePositive", "FalsePositive", "TrueNegative", and "FalseNegative" .

Given two lists of actual and predicted labels a ROC Association object can be made with the function ToROCAssociation .

For more definitions and example of ROC terminology and functions see [3].

Why Linear regression

I was asked in this discussion why Linear regression and not, say, Logistic regression.

Here is my answer:

1. I am trying to do a minimal and quick to execute example — the code of the post is included in the package ROCFunctions.m.

2. I am aware that there are better alternatives of LinearModelFit, but I plan to discuss those in the MathematicaVsR project: “Regression with ROC”. (As the name hints, it is not just about Linear regression.)

3. Another point is that although the Linear regression is not a good method for this classification, nevertheless using ROC it can be made to give better results than, say, the built-in “NeuralNetwork” method. See the last section of “Linear regression with”.

Minimal example

Note that here although we use both of the provided Titanic training and test data, the code is doing only training. The test data is used to find the best tuning parameter (threshold) through ROC analysis.

Get packages

These commands load the packages [1,2]:


Using Titanic data

Here is the summary of the Titanic data used below:

titanicData = (Flatten@*List) @@@ExampleData[{"MachineLearning", "Titanic"}, "Data"];
columnNames = (Flatten@*List) @@ExampleData[{"MachineLearning", "Titanic"}, "VariableDescriptions"];
RecordsSummary[titanicData, columnNames]


This variable dependence grid shows the relationships between the variables.

Magnify[#, 0.7] &@VariableDependenceGrid[titanicData, columnNames]


Get training and testing data

data = ExampleData[{"MachineLearning", "Titanic"}, "TrainingData"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
trainingData = DeleteCases[data, {___, _Missing, ___}];

(* {732, 4} *)

data = ExampleData[{"MachineLearning", "Titanic"}, "TestData"];
data = ((Flatten@*List) @@@ data)[[All, {1, 2, 3, -1}]];
testData = DeleteCases[data, {___, _Missing, ___}];

(* {314, 4} *)

Replace categorical with numerical values

trainingData = trainingData /. {"survived" -> 1, "died" -> 0, "1st" -> 0, "2nd" -> 1, "3rd" -> 2, "male" -> 0, "female" -> 1};

testData = testData /. {"survived" -> 1, "died" -> 0, "1st" -> 1, "2nd" -> 2, "3rd" -> 3, "male" -> 0, "female" -> 1};

Do linear regression

lfm = LinearModelFit[{trainingData[[All, 1 ;; -2]], trainingData[[All, -1]]}]


Get the predicted values

modelValues = lfm @@@ testData[[All, 1 ;; -2]];

Histogram[modelValues, 20]




Obtain ROC associations over a set of parameter values

testLabels = testData[[All, -1]];

thRange = Range[0.1, 0.9, 0.025];
aROCs = Table[ToROCAssociation[{0, 1}, testLabels, Map[If[# > \[Theta], 1, 0] &, modelValues]], {\[Theta], thRange}];

Evaluate ROC functions for given ROC association

Through[ROCFunctions[{"PPV", "NPV", "TPR", "ACC", "SPC"}][aROCs[[3]]]]

(* {34/43, 19/37, 17/32, 197/314, 95/122} *)

Standard ROC plot

ROCPlot[thRange, aROCs, "PlotJoined" -> Automatic, "ROCPointCallouts" -> True, "ROCPointTooltips" -> True, GridLines -> Automatic]


Plot ROC functions wrt to parameter values

ListLinePlot[Map[Transpose[{thRange, #}] &, Transpose[Map[Through[ROCFunctions[{"PPV", "NPV", "TPR", "ACC", "SPC"}][#]] &, aROCs]]],
 Frame -> True, FrameLabel -> Map[Style[#, Larger] &, {"threshold, \[Theta]", "rate"}], PlotLegends -> Map[# <> ", " <> (ROCFunctions["FunctionInterpretations"][#]) &, {"PPV", "NPV", "TPR", "ACC", "SPC"}], GridLines -> Automatic]


Finding the intersection point of PPV and TPR

We want to find a point that provides balanced positive and negative labels success rates. One way to do this is to find the intersection point of the ROC functions PPV (positive predictive value) and TPR (true positive rate).

Examining the plot above we can come up with the initial condition for \(x\).

ppvFunc = Interpolation[Transpose@{thRange, ROCFunctions["PPV"] /@ aROCs}];
tprFunc = Interpolation[Transpose@{thRange, ROCFunctions["TPR"] /@ aROCs}];
FindRoot[ppvFunc[x] - tprFunc[x] == 0, {x, 0.2}]

(* {x -> 0.3} *)

Area under the ROC curve

The Area Under the ROC curve (AUROC) tells for a given range of the controlling parameter "what is the probability of the classifier to rank a randomly chosen positive instance higher than a randomly chosen negative instance, (assuming ‘positive’ ranks higher than ‘negative’)", [3,4]

Calculating AUROC is easy using the Trapezoidal quadrature formula:

 N@Total[Partition[Sort@Transpose[{ROCFunctions["FPR"] /@ aROCs, ROCFunctions["TPR"] /@ aROCs}], 2, 1] 
   /. {{x1_, y1_}, {x2_, y2_}} :> (x2 - x1) (y1 + (y2 - y1)/2)]

 (* 0.698685 *)

It is also implemented in [2]:


(* 0.698685 *)


[1] Anton Antonov, MathematicaForPrediction utilities, (2014), source code MathematicaForPrediction at GitHub, package MathematicaForPredictionUtilities.m.

[2] Anton Antonov, Receiver operating characteristic functions Mathematica package, (2016), source code MathematicaForPrediction at GitHub, package ROCFunctions.m .

[3] Wikipedia entry, Receiver operating characteristic. URL: .

[4] Tom Fawcett, An introduction to ROC analysis, (2006), Pattern Recognition Letters, 27, 861-874.

Finding local extrema in noisy data using Quantile Regression


This blog post (article) describes an algorithm for finding local extrema in noisy data using Quantile Regression. The problem formulation and a solution for it using polynomial model fitting (through LinearModelFit) were taken from Mathematica StackExchange — see “Finding Local Minima / Maxima in Noisy Data”, [1].

The proposed Quantile Regression algorithm is a version of the polynomial fitting solution (proposed by Leonid Shifrin in [1]) and has the following advantages: (i) requires less parameter tweaking, and (ii) more importantly it is more robust with very noisy and oscillating data. That robustness is achieved by using two regression fitted curves: one close to the local minima and another close to the local maxima, computed for low and high quantiles respectively. (Quantile Regression is uniquely able to do that.)

The code for this blog post is available at [6].

A complete version of this blog post is available as a PDF document here: Finding local extrema in noisy data using Quantile Regression.pdf.

The problem

Here is the problem formulation from [1].

Problem 1: We have a list of pairs of numbers representing measurements of a scalar variable on a regular time grid. The measurements have noise in them.

As a data example for this problem the author of the question in [1] has provided the following code:

temptimelist = Range[200]/10;
tempvaluelist = Sinc[#] &@temptimelist + RandomReal[{-1, 1}, 200]*0.02;
data1 = Transpose[{temptimelist, tempvaluelist}];
ListPlot[data1, PlotRange -> All, Frame -> True, ImageSize -> 500]


In this article we are going to consider a more general problem hinted in the discussion of the solutions in [1].

Problem 2: Assume that the data in Problem 1 is collected several times and the noise is present in both the measured values and the time of measurement. Also the data can be highly oscillatory in nature.

Consider this data generation as an example for Problem 2:

n = 1000;
xs = N@Rescale[Range[n], {1, n}, {0, 60}];
data2 = Flatten[
 Table[Transpose[{xs + 0.1 RandomReal[{-1, 1}, Length[xs]], 
 Map[5 Sinc[#] + Sin[#] + 4 Sin[1/4 #] &, xs] + 
 1.2 RandomVariate[SkewNormalDistribution[0, 1, 0.9], 
 Length[xs]]}], {10}], 1];
ListPlot[data2, PlotRange -> All, Frame -> True, ImageSize -> 500]


Note that this data has 10000 points and it is much larger than the data for Problem 1.

Extrema location approximation by model fitting followed by nearest neighbors search

Several solutions are given in [1]. A couple are using wavelets or Gaussian filtering for de-noising. The one we are going to focus on in this article is using polynomial fitting for extrema location approximation and then finding the actual data extrema by Nearest Neighbors (NN’s) search. It is provided by Leonid Shifrin.

Let us list the steps of that algorithm:

1. Fit a polynomial through the data (using LinearModelFit).
2. Find the local extrema of the fitted polynomial. (We will call them fit estimated extrema.)
3. Around each of the fit estimated extrema find the most extreme point in
the data by nearest neighbors search (by using Nearest).

We are going to refer to this algorithm as LMFFindExtrema. Its implementation is available here: QuantileRegressionForLocalExtrema.m, [6].

Here is the application of the algorithm to the data example of Problem 1:


(The continuous line shows the fitted curve.)

Here is the application of the algorithm to the data example of Problem 2:


It does not help to just increase the number of basis polynomials and the number of NN’s examined points:


We can also see that increasing the number of examined NN’s for each of the fit estimated extrema would make some the final result points to be “borrowed” from neighboring peaks in the data.

Experiments with fitting trigonometric basis functions showed very good fit to the data but the calculations of the fit estimated extrema were very slow.

Using Quantile Regression

It is trivial to rewrite the algorithm LFMFindExtrema to use Quantile Regression instead of linear model fitting of polynomials. We are going to use the Mathematica package [2] implementation described in [3,4]. The function QuantileRegression provided by [2] uses a B-spline basis [5] to find the quantile regression curves (also known as regression quantiles).

We are going to call this algorithm QRFindExtrema. Again its implementation is available here: QuantileRegressionForLocalExtrema.m, [6].

The algorithm QRFindExtrema has the following parameters: the data, number of B-spline knots, interpolation order, quantiles (corresponding to the curves to be fitted).

QRFindExtrema returns a list of regression quantile functions and a list of lists with extrema estimates.

More importantly though, QRFindExtrema can use two curves for finding the local extrema: one for local minima, and one for local maxima. (This feature is justified below.)

Here is the application of QRFindExtrema to the example data of Problem 1:


Here is application of QRFindExtrema to the data of Problem 2:


We can see that the regression quantile for 0.5 is too flat to get good judgment of the local extrema. We can get better results if we increase the number of knots for the B-spline basis built by QuantileRegression.


We can see that results look “almost right”, the horizontal locations of the peaks are apparently more-or-less correctly identified, but the result extrema are too close to the fitted curve. Just increasing the number of examined NN’s for the fit estimated extrema does not produce good results because points from neighboring peaks are being chosen as final extrema estimates.


In order to solve this problem we use two regression quantiles. For local minima we use a regression quantile for a low quantile number, say, 0.02; for the local maxima we use a regression quantile for a large quantile number, say, 0.98 .


The use of two (or more) curves to be fitted is an unique capability of Quantile Regression. With this algorithm feature by construction the lower regression quantile is close to the local minima and the higher regression quantile is close to the local maxima.

Also, since we find two regression quantile curves we can use two nearest neighbors finding functions: one with the points below the low regression quantile, and one with the points above the high regression quantile. The implementation in [6] takes an option specification for should the nearest neighbor functions for finding the extrema be constructed using all data points or just the outliers (the points outside of the found regression quantiles).

More examples

More timid noisy and oscillating data with around 10,000 points.



[1] Mathematica StackExchange discussion. “Finding Local Minima / Maxima in Noisy Data”,
URL: .

[2] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,, package QuantileRegression.m, (2013).

[3] Anton Antonov, Quantile regression through linear programming, usage guide at GitHub,, in the directory “Documentation”, (2013).

[4] Anton Antonov, Quantile regression through linear programming, “Mathematica for prediction algorithms” blog at, 12/16/2013.

[5] Anton Antonov, Quantile regression with B-splines, “Mathematica for prediction algorithms” blog at, 1/1/2014.

[6] Anton Antonov, QuantileRegressionForLocalExtrema Mathematica package, source code at GitHub,, package QuantileRegressionForLocalExtrema.m, (2015).
The package is in the directory “Applications”.

Directional quantile envelopes


In this blog post I am going to discuss the computation of the so called directional quantile envelopes; see [5] for definitions, theorems, and concrete examples. The Mathematica package QuantileRegression.m, [1] has an experimental implementation, the function QuantileEnvelope.

This type of implementation investigation is a natural extension of the quantile regression implementations and utilization I have done before. (See the package [1] and the related guide [2]; blog posts [3,4].) I was looking for a 2D quantile regression technique that can be used as a tool for finding 2D outliers. From the exposition below it should be obvious that the technique can be generalized in higher dimensions.

The idea of directional quantile envelopes is conceptually simple (and straightforward to come up with). The calculation is also relatively simple: over a set of uniformly distributed directions we find the lines that separate the data according a quantile parameter q, 0<q<1, and with those lines we approximate the enveloping curve for data that corresponds to q.

The algorithm

Here is the full algorithm. (The figure below can be used as a visual aid).

Parameters: data as a matrix of two columns, a quantile parameter q, 0<q<1.
1. Standardize the data.
1.1. Choose a center of the data (e.g. means of the coordinates) and a scale (e.g. inter-quartile distances).
1.2. Transpose the data to the center and scale the obtained coordinates.
2. Create a set if uniform angles, A:={i*2*Pi/k}, 0<=i<=k-1 .
2.1. Denote with a[i]:=i*2*Pi/k, i∈[0,k-1].
3. Rotate the data for each angle, a[i]∈A.
4. Find the q-quantile of the y-coordinates of the data rotated at angle a. Denote that point with r[a,q].
5. Find the points y[a,q] corresponding to r[a,q] in the original coordinate system.
6. Find the quantile lines intersections
6.1. For each angle a∈A we denote with n[a] the vector corresponding to a.
6.2. For each a∈A we have a line l[a,q] with normal n[a] and passing through y[a,q].
6.3. Find the intersection points P[q] of the lines l[a,q].
6.3.1. Each point p[j]∈P[q], j∈[1,k] is an intersection of lines at consecutive angles.
6.3.2. In other words p[j]=l[a[j-1],q] ∩ l[a[j],q], j∈[1,k].
7. Apply the reverse of the transformation done at step 3 to P[q].
8. Connecting the consecutive points of P[q] gives the directional quantile envelope approximation.

Click this thumbnail for an image with better math notation:

Algorithm visual aid

Here is an example of an application of the described algorithm. The points in blue are the data points, the points in red are the points y[a,q], the red lines are the lines l[a,q], the points in black are the points P.


Remark 1: The function QuantileEnvelope of [1] follows the described algorithm. The rotations are done using matrix multiplication; the function Reduce is used to find the intersections between the lines.

Remark 2: A possible alternative algorithm is to use “the usual” quantile regression with polar coordinates. (The article [5] refers to and briefly discusses such an approach.)

Example data

In order to demonstrate the function QuantileEnvelope let us define a couple of data sets.

The first data set is generated with the function SkewNormalDistribution, so the data set name is sndData.

npoints = 12000;
data = {RandomReal[{0, 2 \[Pi]}, npoints],
RandomVariate[SkewNormalDistribution[0.6, 0.3, 4], npoints]};
data = MapThread[#2*{Cos[#1], Sin[#1]} &, data];
rmat = RotationMatrix[-\[Pi]/2.5].DiagonalMatrix[{2, 1}];
data = Transpose[rmat.Transpose[data]];
data = TranslationTransform[{-Norm[Abs[#]/3], 0}][#] & /@ data;
sndData = Standardize[data];

Here is a plot of sndData with grid lines derived from the quantiles of x and y coordinates of the data set.

ListPlot[sndData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed",
GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[sndData]],
ImageSize -> 700]


The second uses the function PoissonDistribution, so the data set name is pdData.

npoints = 12000;
data = RandomReal[NormalDistribution[12, 5.6], npoints];
data = Transpose[{data,
data + RandomVariate[PoissonDistribution[12], npoints] +
RandomReal[{-0.2, 0.2}, npoints]}];
data = Select[data, #[[2]] > 0 &];
data[[All, 2]] = Log[data[[All, 2]]];
pdData = data;

Here is a plot of pdData with grid lines derived from the quantiles of x and y coordinates of the data set.

ListPlot[pdData, PlotRange -> All, AspectRatio -> 1, PlotTheme -> "Detailed",
GridLines -> Map[Quantile[#, Range[0, 1, 1/19]] &, Transpose[pdData]],
ImageSize -> 700]


Examples of application (with QuantileEnvelope)

Let us demonstrate the function QuantileEnvelope. The function arguments are a data matrix, a quantile specification, and number of directions. The quantile specification can a number or a list of numbers. The function returns the intersection points of the quantile lines at different directions.

Here is an application to the dataset sndData over 8 different quantiles over 60 different uniformly spread directions.

In[630]:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, .99} ;
AbsoluteTiming[qsPoints = QuantileEnvelope[sndData, qs, 60];]
Out[631]= {0.150104, Null}

The result is a list of 8 lists of 60 points.

In[632]:= Dimensions[qsPoints]
Out[632]= {8, 60, 2}

Let us plot the data and the direction quantile envelopes.

Block[{data = sndData},
Show[{ListPlot[data, AspectRatio -> 1, PlotTheme -> {"Detailed"},
GridLines ->
Map[Quantile[#, Range[0, 1, 1/(20 - 1)]] &, Transpose[data]],
PlotLegends ->
Blend[{Red, Orange},
Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]] & /@ qs, qs]],
Graphics[{PointSize[0.005], Thickness[0.0025],
MapThread[{Blend[{Red, Orange},
Rescale[#1, {Min[qs], Max[qs]}, {0, 1}]],
Tooltip[Line[Append[#2, #2[[1]]]], #1], Point[#2]} &, {qs,
}, ImageSize -> 700]]


Let us similar calculations for the dataset pdData.

In[647]:= qs = {0.7, 0.75, 0.8, 0.85, 0.90, 0.95, 0.98, 0.99} ;
AbsoluteTiming[qsPoints = QuantileEnvelope[pdData, qs, 60];]
Out[648]= {0.154227, Null}

In[649]:= Dimensions[qsPoints]
Out[649]= {8, 60, 2}

Here is the plot of the dataset pdData and directional quantile envelopes.

Some more quantile curves

Here are couple of more plots of the quantile lines at different directions with a larger number of directions.

Lines for quantile 0.95 in 40 directions:

Lines for quantiles 0.95 and 0.8 in 40 directions:


[1] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,, package QuantileRegression.m, (2013).

[2] Anton Antonov, Quantile regression through linear programming, usage guide at GitHub,, in the directory “Documentation”, (2013).

[3] Anton Antonov, Quantile regression through linear programming, “Mathematica for prediction algorithms” blog at, 12/16/2013.

[4] Anton Antonov, Quantile regression with B-splines, “Mathematica for prediction algorithms” blog at, 1/1/2014.

[5] Linglong Kong, Ivan Mizera, “Quantile tomography: using quantiles with multivariate data”, 2013, arXiv:0805.0056v2 [stat.ME] URL: .

Quantile regression through linear programming

We can say that least squares linear regression corresponds to finding the mean of a single distribution. Similarly, quantile regression corresponds to finding quantiles of a single distribution. With quantile regression we obtain curves — “regression quantiles” — that together with the least squares regression curve would give a more complete picture of the distributions (the Y’s) corresponding to a set of X’s.

The Quantile Regression Problem (QRP) is formulated as a minimization of the sum of absolute differences. Further, QRP is re-formulated as a linear programming problem, which allows for efficient computation. To compute quantiles other than the median the so called “tilted” function is used. (See this Wikipedia entry.)

For a complete, interesting, and colorful introduction and justification to quantile regression I refer to this article:

Roger Koenker, Gilbert Bassett Jr., “Regression Quantiles”, Econometrica, Vol. 46, No. 1. (Jan., 1978), pp. 33-50. JSTOR URL: .

I recently implemented and uploaded a Mathematica package for computing regression quantiles using Mathematica’s function LinearProgramming — see the package QuantileRegression.m provided by the MathematicaForPrediction project at GitHub. Also see this guide for using the function QuantileRegressionFit provided by the package. The guide has some theoretical explanations and shows how the quantile regression problem can be formulated as a linear programming problem.

Below are presented several examples of regression quantiles computed over different data sets with the function QuantileRegressionFit provided by the package. The signature of QuantileRegressionFit is very similar to that of Mathematica‘s function Fit. Here is the definition statement:

QuantileRegressionFit::usage = “QuantileRegressionFit[data,funs,var,qs] finds the regression quantiles corresponding to the quantiles qs for a list of data as linear combinations of the functions funs of the variable var.”

QuantileRegressionFit has a Method option and can use both Minimize and LinearProgramming, but the computations with Minimize are quite slow for larger data sets. (The implementation with Minimize is included for didactic purposes.) All QuantileRegressionFit results presented below are done with the linear programming implementation.

First let us consider data generated by adding skewed noise over a logarithmic curve:

Logarithmic curve with skewed noise

Pretending that we don’t know how the data is generated, just by looking at the plot, we might conjecture that the model for the data is

Y = b0 + b1 X + b2 X^(1/2) + b3 log(X) .

Using this model we find regression quantiles corresponding to 0.05, 0.25, 0.5, 0.75, 0.95. For comparison we also find the least squares fit of the model. Here is a plot that shows the regression quantiles for that data together with the least squares fit of the model functions:

Regression quantiles for a logarithmic curve with skewed noise

We can check do the regression quantiles partition the data in the expected way. Here is a table that shows the fraction of the data points with Y’s greater or equal than the corresponding regression quantile:

Logarithmic curve with skewed noise q-table

Here is a plot of the with the regression quantiles of a more interesting set of data:

Regeression quantiles for sin noise over a parabola

Here is the table for the data partition tests:

Sin noise over a parabola q-table

I further made some performance profiling tests. First we need to choose a family or several families of test data. Also, since Mathematica’s function LinearProgramming has several methods it is a good idea to test with all of them. Here I am going to show results only with one family of data and two LinearProgramming methods. The data family is the skewed noise over a logarithmic curve used as an example above. The first LinearProgramming method is Mathematica‘s  (default) “InteriorPoint”, the second method is “CLP” that uses the built-in COIN-OR CLP optimizer. I run the profiling tests using one quantile {0.5} and five quantiles {0.05, 0.25, 0.5, 0.75, 0.95}, which are shown in blue and red respectively. I also run tests with different number of model functions {1, x, x^(1/2), log(x)} and {1, x, log(x)} but there was no significant difference in the timings (less than 2%).

Performance profile InteriorPoint method

Performance profile CLP method

It is interesting to note that the average ratio of the timings with 1 vs. 5 quantiles is 0.38 for “InteriorPoint” and 0.5 for “CLP”.

In another post I am going to show examples of the robustness of regression quantiles when the data has outliers.

Update: Because I implemented calculation of regression quantiles with B-splines I had to rename the function QuantileRegression into QuantileRegressionFit. I made the corresponding changes in the blog post. (Except the profile graphics captions.)