Choosing the Cutoff Value

If you work with binary classifiers, then you are familiar with the problem of choosing a cutoff value.  While the classifier will predict positives and negatives, under the covers it’s a probability score with an implicit 0.50 threshold.  Since most real-life data is imbalanced, 0.50 will not be the right value.

This activity of finding the right cutoff value, and choosing the desired accuracy metric, can be a hassle, so I developed a tool to help me deal with it. In this article, I’ll show how I use the tool for a typical problem involving credit approval

When training a binary classifier, we generally look at the “receiver operating characteristic” or ROC curve.  This is a plot of true positives versus false positives for all choices of the cutoff value. A nice, plump ROC curve means that the model is fit for purpose, but you still have to choose the cutoff value.

In this example, we have an ROC with “area under the curve” of 0.76.  This is a good score, but the default 0.50 threshold happens to lie where the curve runs into the lower left corner.  Using the slider on my ROC tool, I can run this point up and down the curve, maximizing whichever accuracy metric I choose. 

To do this, I have the classifier write a list of its predicted probability scores into a file, along with the actuals (y_pred, y_val) and then I read that file into the tool.  If you’re using Scikit, you’ll want predict_proba for this.

In this case, the best balanced accuracy is achieved when the cutoff value is 0.11.  We need balanced accuracy because our exploratory data analysis showed that the data is nine to one imbalanced in favor of negative cases. 

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further and ask what is the gain or loss from each decision.  

In the context of our credit problem, negatives are people who don’t default on their loan.  Our classifier could present 90% naïve “accuracy” simply by calling every case a negative.  We would confidently approve loans for everyone, and then encounter a 10% default rate. 

The tool displays other popular accuracy metrics like precision, recall, and F1 score.  By the way, notice that the true positive rate (TPR) and the false negative rate (FNR) add to unity because these are all the positive cases. The same goes for negatives.  The TPR is also known as “sensitivity.”

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further.  We can ask what is the gain or loss from each decision.  The tool accepts these figures, with the false ones marked as red.

For example, let’s say that a false negative costs us $10,000 in collections and recovery charges, while a true negative means we earn $7,500 in interest.  True positives and false positives will both count as zero, because we declined them.

We can see that our maximum expected value of $2,170 is achieved when the cutoff value is reduced to 0.08.  This is below the optimum for balanced accuracy.  It is accuracy weighted more heavily to avoid false negatives.

I hope you enjoy using the tool.  Remember, it’s best practice to do all this with your training or validation dataset, and then commit to a cutoff value for your final test.

Penetration Chart with Bokeh

I have been honing my charting skills lately, because Bokeh is so amazing, and looking for practical applications (outside my stock trading hobby).  Here’s one I found recently.  This chart explores the timeless question, “are product sales off because the dealer isn’t supportive, or are vehicle sales off, too?”

I am thinking of protection products, but the same question could be asked of finance contracts or, indeed, anywhere you need to consider “penetration.”  That is, the percentage of vehicle sales that are also sales of your product.

Are product sales off because the dealer isn’t supportive, or are vehicle sales off, too?

In this chart, we consider year over year change in contracts relative to the change in vehicle sales for a collection of dealers.  Bubble size indicates the size of each dealership in sales volume.  We’ll get to bubble color in a minute.  Also, note the horizontal and vertical zero lines.

The dealers in the lower left quadrant have an excuse.  Riverside, for example, is down 30% in product sales.  When you call them, though, they’ll counter that they’re having a bad year.  Volume is also down, albeit only 11%.

The dealers in the lower right quadrant have no such excuse.  Downtown, for example, is also off 30% but on much improved vehicle sales.  So, we can infer that penetration has declined, and color them a darker shade of red.  Similarly, although contracts are up at National, they should be up more considering the good year they’re having.  So, orange.

O’Malley is green because, while contracts are off a bit, vehicle sales are worse.  O’Malley is doing the right thing and ramping up products to compensate for weak sales.  What the chart shows on the X and Y axes is straightforward enough, but it shrewdly assigns colors according to the change in penetration.

Bokeh is the visualization library Python programmers use instead of R or Matplotlib.  The color scheme here comes from running its red, yellow, green “linear color mapper” diagonally across the chart from lower right to upper left.  Dealers where penetration is unchanged from last year are yellow, like College and Bellevue.

Speculation on Fractal Programming Language

We flew east out of Panama City, and I looked down on the faceted green hills of the Cordillera de San Blas, perhaps for the last time.  In the sky were statistically similar puffs of white cumulus cloud.  Naturally, I was thinking of fractals.

I had spent the past few days coding technical analysis indicators in Python, which reminded me of coding same in C#.  This, in turn, reminded me that although the TA community talks a lot about geometric repetition, we have yet to invent a single fractal indicator, much less a trading strategy.

I write my trading strategies in C# on the MultiCharts platform.  Its procedures for time series data look a lot like the vector-oriented syntax of Python.  Here is how to do Bollinger bands in each:

  • StandardDeviationCustom(length, devs)
  • df[price].rolling(length).std() * devs

I have to admit not having much intuition about vector operations.  Matrices and summations just look like for loops to me – clearly an obstacle to the proper appreciation of Python.  I have worked with SAS and SYSTAT, though, so Python at the command prompt seems natural.

What I noticed with the Python exercise is that the classic TA indicators were designed with an iterative mindset, reflecting the programming languages of the day – Sapir’s theory, again – and so I imagine that the reason we have no fractal indicators is that our language can’t express them.

Here are some basic things I would expect from a fractal-oriented programming language:

  • Create a dataset from a generator function
  • Derive fractal metrics, like the Hausdorff dimension
  • Compare two datasets for statistical similarity
  • Compare a dataset to subsets of itself

Admittedly, I have only a cursory notion of how this would work.  That’s why this piece has “speculation” in the title.  Meanwhile, I will continue plugging away in C# and Python.