If you work with binary classifiers, then you are familiar with the problem of choosing a cutoff value. While the classifier will predict positives and negatives, under the covers it’s a probability score with an implicit 0.50 threshold. Since most real-life data is imbalanced, 0.50 will not be the right value.
This activity of finding the right cutoff value, and choosing the desired accuracy metric, can be a hassle, so I developed a tool to help me deal with it. In this article, I’ll show how I use the tool for a typical problem involving credit approval.
When training a binary classifier, we generally look at the “receiver operating characteristic” or ROC curve. This is a plot of true positives versus false positives for all choices of the cutoff value. A nice, plump ROC curve means that the model is fit for purpose, but you still have to choose the cutoff value.
In this example, we have an ROC with “area under the curve” of 0.76. This is a good score, but the default 0.50 threshold happens to lie where the curve runs into the lower left corner. Using the slider on my ROC tool, I can run this point up and down the curve, maximizing whichever accuracy metric I choose.
To do this, I have the classifier write a list of its predicted probability scores into a file, along with the actuals (y_pred, y_val) and then I read that file into the tool. If you’re using Scikit, you’ll want predict_proba for this.
In this case, the best balanced accuracy is achieved when the cutoff value is 0.11. We need balanced accuracy because our exploratory data analysis showed that the data is nine to one imbalanced in favor of negative cases.
For problems like this, balanced accuracy is usually sufficient, but we can take it a step further and ask what is the gain or loss from each decision.
In the context of our credit problem, negatives are people who don’t default on their loan. Our classifier could present 90% naïve “accuracy” simply by calling every case a negative. We would confidently approve loans for everyone, and then encounter a 10% default rate.
The tool displays other popular accuracy metrics like precision, recall, and F1 score. By the way, notice that the true positive rate (TPR) and the false negative rate (FNR) add to unity because these are all the positive cases. The same goes for negatives. The TPR is also known as “sensitivity.”
For problems like this, balanced accuracy is usually sufficient, but we can take it a step further. We can ask what is the gain or loss from each decision. The tool accepts these figures, with the false ones marked as red.
For example, let’s say that a false negative costs us $10,000 in collections and recovery charges, while a true negative means we earn $7,500 in interest. True positives and false positives will both count as zero, because we declined them.
We can see that our maximum expected value of $2,170 is achieved when the cutoff value is reduced to 0.08. This is below the optimum for balanced accuracy. It is accuracy weighted more heavily to avoid false negatives.
I hope you enjoy using the tool. Remember, it’s best practice to do all this with your training or validation dataset, and then commit to a cutoff value for your final test.