Jigsaw Unintended Bias Toxic Comment Classification.

Published in

Analytics Vidhya

5 min readSep 27, 2019

In this blog I will summarize the approach i took myself and by reading other’s notebook on kaggle to solve the problem which was put on kaggle by The Conversation AI team, a research initiative founded by Jigsaw and Google.

Problem Statement:

The objective was to detect toxic comments and reduce the unintended bias of the model.

Background:

Before this competition , Jigsaw also had hosted a competition in the previous year with the objective of detecting toxic comments. The model that they built has mistakenly learned to associate toxicity with some identity words such as (gay, black, home-sexual etc.). Therefore it would classify sentences such as “I am a gay woman” as toxic comment. Therefore the objective of this competition was to reduce this bias towards identity words.

Dataset Overview:

The data set provided by Jigsaw was large with 1.8 million comments. The data set contains 45 columns but only few of them are of our use. The “comment_text” column contains the comment and “target” column shows how toxic a comment is. This is the value our model should predict at test time. A value ≥0.5 means comment is toxic (positive).

Other columns that were of interest were the 9 identity columns (subgroups) namely ‘male’, ‘female’,‘black’,“christian”, ‘jewish’, ‘muslim’, ‘white’, ‘psychiatric_or_mental_illness’, ‘homosexual_gay_or_lesbian’. We will use these columns to improve our model’s performance. You will get to know about this in later part of this blog.

The data set is highly imbalanced with around 8% points belonging to class 1(toxic comments) and 92% belonging to class 0(non toxic comments).

Evaluation Metric:

In this we were given a custom metric which was designed to overcome the unintended bias of the model. This new metric is a weighted combination of various sub-metric. I will define each one below.

Overall AUC: This is the roc-auc that will be calculated over the entire test data set.

Bias AUC: Now we will again calculate roc-auc to counter the bias of the model. We will calculate roc-auc for the following three subsets of test data set each one capturing different aspect of the unintended bias.

Subgroup auc: We will calculate auc for each subgroup (identity columns). A low value of auc will mean that the model confuses if the comment contains identity words.
Background Positive & Subgroup Negative(bpsn): Here we will calculate auc for test set where toxic comments that doesn’t mentions identity and non-toxic comments that mentions identity. A low auc will mean that the model confuses between toxic comment that doesn’t mentions identity and non-toxic comments that mentions identity.
Background Negative & Subgroup Positive(bnsp): Here we will calculate auc for test set where toxic comments that mentions identity and non-toxic comments that doesn’t mentions identity. A low auc will mean that the model confuses between toxic comment that mentions identity and non-toxic comments that doesn’t mentions identity.

Now we will combine the three biased AUC and will calculate their mean as below.

For this competition p was taken as -5. This value of p was selected to improve the performance of the model for the identity subgroups where it has poor performance. To understand this statement let’s see the below example.

Lets take four subgroups with auc of 0.95, 0.98, 0.97 and 0.70. If we take simple mean we will get a value of 0.90 which gives us a false impression that our model is performing good, but when we use the above method to calculate the mean we get a value of 0.84. Hence we can see that our metric gives a low auc if one or more subgroups have poor performance.

Now we will see how we calculate the final metric. Below is the formula.

Data Cleaning: We removed all the stop words and and special characters from the text data except for the abusive words which were like fu*k, di*k,pu**y etc.

Approaches:

Using Classical Machine Learning: In our first approach we used logistic regression and we got a score of 0.89 on private leader board . We have vectorized our data using tf-idf to train our model. Then we tried other model such as naive baye’s but our score did not improved . Therefore we didn’t kept this in our solution. We also tried combining char grams along with tf-idf but our score didn’t improved.
Using Deep Learning: After using machine learning models we tried deep learning models to improve our score. First we used crawl 300d and glove 100d as our embedding to train all of the deep learning models. We started with LSTMs but our score hovered around 0.50 to 0.51 . If you want to learn how LSTMs works then i suggest you to read Christopher Ola’s blog. This showed our model was performing like a random model. Then we trained various architecture and variation of bi-LSTMs and the architecture which gave the best score of 0.93309 is as explained below.

The first layer is embedding layer and then a spatialdropout of 0.2.Then we have two layers of bi-LSTMs with 128 units one after other. The we apply attention layer and global max pooling layer on the outputs separately. Then we concatenated these two layers and the used two layers of fully densely connected layers on top of this with skip connections.

You can make skip connections by using keras.layers.add. This can be done by adding the output of previous layer and output of dense layer . This dense layer takes the output of previous layer as input. This helps in improving the flow of gradients when the network is deep.

The most important thing that improved our score was to give weight to each data point. Now here you have to pay attention to grasp this. Now i am going to explain how i assigned weight to a data point. All the data point was assigned a weight of 0.25 initially. If a data point belongs to any of the subgroup(9 identity groups) then it is assigned an additional weight of 0.25. If the data point belong to bpsn group or bnsp group then an additional weight of 0.25 was assigned. This weight is used during the calculation of loss and loss gets multiplied by the weight of the data point while calculating loss. Prior to use of these weight i was getting a score of 0.92 . After using this the score improved to 0.93309.

You can check my github account for more details and better understanding. Github link : https://github.com/riteshranjan110/Jigsaw-Unintended-Bias-Toxic-Comment-Classification/blob/master/Jigsaw4.ipynb