Calibration in Machine Learning

Ritesh Ranjan
Analytics Vidhya
Published in
5 min readSep 14, 2019

--

In this blog we will learn what is calibration and why and when we should use it.

Image taken from SHOEBOX Audiometry

We calibrate our model when the probability estimate of a data point belonging to a class is very important.

Calibration is comparison of the actual output and the expected output given by a system. Now let me put this in the perspective of machine learning.

In calibration we try to improve our model such that the distribution and behavior of the probability predicted is similar to the distribution and behavior of probability observed in training data.

Suppose we have a small sample of data points. Say we have 10 points in the sample and out of these 10 points 7 belongs to positive class and 3 belongs to negative class. Then fraction of positive points is 0.7(observed probability). This means that there is 70% probability that a point in this sample will get a class label as positive. Now we will take the average of probability estimates (predicted probability) predicted by the model. We expect this average to be around 0.7 . If this value is far from 0.7 then our model is not calibrated. Following graph shows the ideal case when the model is perfectly calibrated. We don’t get this plot in practice.

Now to draw calibration plot the following steps are followed.

  1. Create a data set with two columns that are actual label and its predicted probability given by the model.
  2. Sort this data set in ascending order of the probability predicted by the model.
  3. Now divide the data set in bins of some fixed size . If the data set is large then keep bin size large and vice versa.
  4. Now calculate fraction of actual positive in each bin and the average of probabilities predicted by the model.
  5. Plot a graph with fraction of positive on y-axis and average probability on x-axis.

Facing difficulty in understanding how to prevent your model from overfitting and underfitting then go through this blog.

There are two methods of Calibration.

  1. Sigmoid/ Platt’s : In this technique we use a slight variation of sigmoid function to fit out distribution of predicted probabilities to the distribution of probability observed in training data. We actually perform logistic regression on the output of the model with respect to the actual label. The mathematical function used is shown below:
Modified Sigmoid Function Used in Platt’s Calibration

Now to perform calibration we have to learn the value of two parameters A and B . We are using gradient descent to learn , you can use any optimization algorithm of your choice. The mathematical formulation is shown below.

Mathematical Formulation to update hyper-parameters A and B.

Coding Sigmoid Calibration in Python:

A simple implementation of sigmoid calibration is below.

Implementation of sigmoid calibration

2. Isotonic : In technique we use piecewise constant non-decreasing function instead of sigmoid function. In this we fit like linear regression in the piecewise fashion. Given a model with fi as predicted value and yi as actual target value then we make an assumption as follows

Assumption in isotonic calibration

where m is an isotonic function. Now given the training set {fi, yi} we try to find the isotonic function m^by minimizing the following equation.

The algorithm used to solve this problem is called pairs adjacent violators algorithm. The pseudo code of the algorithm is given below.

Pairs Adjacent Violators Algorithm

Applying calibration to real data set:

We can apply calibration just by using the CalibratedClassifierCv class available in sklearn library in python. For sigmoid calibration just pass ‘sigmoid’ while creating the object of this class and for isotonic just pass isotonic.

Now before applying calibration we have to diagnose calibration by plotting the reliability diagram of actual probability and probability predicted by the model on test data set. In sklearn we use calibration_curve method .

In this blog i will perform calibration on SVM model using amazon fine food review data set. The link for the data set is below

Data set : https://www.kaggle.com/snap/amazon-fine-food-reviews

First let’s diagnose calibration. So let’s see the code itself.

Code to plot reliability diagram to diagnose calibration
Reliability diagram without calibration

Now let’s use sigmoid calibration and see the result.

Code for sigmoid calibration
Reliability diagram after sigmoid calibration.

Now let’s use isotonic calibration and see the result.

Code for isotonic calibration
Reliability diagram after isotonic calibration.

So as we can see that there is clear improvement in the probability predicted by the model after applying calibration. We can also see that isotonic calibration has performed better in bringing the probability value more closer to the diagonal line.

Now I think that,

Image taken from giphy

This is all for now. If you have any questions then, write in the comments.

--

--

Ritesh Ranjan
Analytics Vidhya

Machine Learning Enthusiast. Writer in Towards Data Science, Analytics Vidhya, and AI In Plain English. LinkedIn: https://www.linkedin.com/in/riteshranjan11055/