Choosing the Cutoff Value

If you work with binary classifiers, then you are familiar with the problem of choosing a cutoff value.  While the classifier will predict positives and negatives, under the covers it’s a probability score with an implicit 0.50 threshold.  Since most real-life data is imbalanced, 0.50 will not be the right value.

This activity of finding the right cutoff value, and choosing the desired accuracy metric, can be a hassle, so I developed a tool to help me deal with it. In this article, I’ll show how I use the tool for a typical problem involving credit approval

When training a binary classifier, we generally look at the “receiver operating characteristic” or ROC curve.  This is a plot of true positives versus false positives for all choices of the cutoff value. A nice, plump ROC curve means that the model is fit for purpose, but you still have to choose the cutoff value.

In this example, we have an ROC with “area under the curve” of 0.76.  This is a good score, but the default 0.50 threshold happens to lie where the curve runs into the lower left corner.  Using the slider on my ROC tool, I can run this point up and down the curve, maximizing whichever accuracy metric I choose. 

To do this, I have the classifier write a list of its predicted probability scores into a file, along with the actuals (y_pred, y_val) and then I read that file into the tool.  If you’re using Scikit, you’ll want predict_proba for this.

In this case, the best balanced accuracy is achieved when the cutoff value is 0.11.  We need balanced accuracy because our exploratory data analysis showed that the data is nine to one imbalanced in favor of negative cases. 

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further and ask what is the gain or loss from each decision.  

In the context of our credit problem, negatives are people who don’t default on their loan.  Our classifier could present 90% naïve “accuracy” simply by calling every case a negative.  We would confidently approve loans for everyone, and then encounter a 10% default rate. 

The tool displays other popular accuracy metrics like precision, recall, and F1 score.  By the way, notice that the true positive rate (TPR) and the false negative rate (FNR) add to unity because these are all the positive cases. The same goes for negatives.  The TPR is also known as “sensitivity.”

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further.  We can ask what is the gain or loss from each decision.  The tool accepts these figures, with the false ones marked as red.

For example, let’s say that a false negative costs us $10,000 in collections and recovery charges, while a true negative means we earn $7,500 in interest.  True positives and false positives will both count as zero, because we declined them.

We can see that our maximum expected value of $2,170 is achieved when the cutoff value is reduced to 0.08.  This is below the optimum for balanced accuracy.  It is accuracy weighted more heavily to avoid false negatives.

I hope you enjoy using the tool.  Remember, it’s best practice to do all this with your training or validation dataset, and then commit to a cutoff value for your final test.

Predicting Loan Defaults with AI

I have some time on my hands, so I decided to experiment with some of the new AI assisted code generators.  I wanted something relevant to F&I, and I found this exercise on Coursera.  The “F” side of F&I gets all the attention, but there is plenty of opportunity for AI to rate insurance risk and mechanical breakdown risk.

Note that we are using AI to generate an AI model.  For the Coursera exercise, linear regression is sufficient, but I chose to use neural networks here because they are undeniably machine learning.  See my earlier post on this, “What Is Real AI?

Today, we’ll look at three popular AI assistants: ChatGPT, GPT-Engineer, and GitHub Copilot.  These are all based on the famous OpenAI large language model, just packaged a little differently.

To start, I worked the problem myself, running several different models.  The goal is to predict the probability of a given loan going bad, based on seventeen variables including credit score, term, and debt to income.  Once I was satisfied with my solution, I turned the problem over to my robot friend, ChatGPT.

ChatGPT

Using the chat window requires you to cut and paste code over to your IDE, so it doesn’t really feel like a code generator.  On the other hand, it’s conversational, so it can tell you its assumptions and you can give clarifying instructions.  Here is the prompt I used:

We need to write a Python script to predict loan defaults using a neural network model, and based on some input data. To start, read the input data from a CSV file and create a data frame. Some of the columns have numeric data and some have categorical data. The last column is the dependent variable. It has values of either zero or one. Next, prepare the data for use in a neural network by running it through an appropriate pipeline. Split off twenty percent of the rows, randomly, for use as a test set. Finally, train the neural network using the remaining eighty percent of the rows. We want to know the probability of a loan default. Test the neural network by comparing its predictions for the test set, and report your results by plotting the ROC curve.

Ordinarily, this would be more interactive, but I wanted to write all the requirements into a single prompt for demonstration purposes, and because that’s the approach preferred by GPT-Engineer.

ChatGPT solved the problem nicely, matching my best score of 0.76.  It plotted its own ROC chart, below, instead of using the Scikit method, even though it did use Scikit for the pipelines.

ChatGPT did several things differently from me, mostly simplifications, and it made some mistakes.  For example, it forgot that its “create neural network” function didn’t include compilation.  The call to compile() can be in or out of the function, so it’s easy to misplace.

The creepiest thing ever?  ChatGPT chose random_state = 42, which is completely arbitrary – and the same number I use!  Like, how did it know?

GPT-Engineer

GPT-Engineer uses the OpenAI API, for which you have to buy tokens.  I think of it as a wrapper for the ChatGPT dialogue above.  GPT-Engineer also runs outside of the IDE.  You invoke it from the command line and pass a file containing the prompt.

The special charm of GPT-Engineer is that it can ask clarifying questions.  Here are the questions it had about my prompt:

  1. The specific format and structure of the input CSV file.
  2. The exact columns that contain numeric and categorical data.
  3. The type of neural network model to be used.
  4. The specific pipeline to be used for preparing the data.
  5. The method for splitting the data into training and test sets.
  6. The method for training the neural network.
  7. The method for testing the neural network and comparing its predictions.
  8. The method for reporting results and plotting the ROC curve.

Some of these questions are reasonable and some are not.  ChatGPT, on its own, figured out how to handle the file, the pipeline, and the structure of the neural net.  In fact, it was kind of fun comparing notes with ChatGPT like, “oh yeah, well, I’m using batch norms.”

I suspect that “code and train a neural net” is asking too much.  GPT-Engineer crushes routine tasks, as my man Arjan demonstrates here.

GPT Engineer is another nail in the coffin of software developers. In this video, I’ll show you how it works. The tool is crazy powerful. Just specify what you want it to build, and then, well, it just builds it.

GitHub Copilot

Microsoft did a nice job of integrating Copilot into Visual Studio and several other IDEs.  You install an extension, and subscribe to the service for $10 a month.  Microsoft has a big ($13 billion) investment in OpenAI, and they own GitHub.  This means an LLM trained not only on human language, but on a giant repository of source code.

Microsoft advertises Copilot as “pair programming,” like having a buddy look over your shoulder, and it works the same way autocomplete works for text.  It can also define a function based on an inline comment like, “read a file from disk and create a dataframe.”

Copilot didn’t really suit me.  I wanted to see how an AI would code differently from me, as ChatGPT had, but Copilot kept serving up my own code from GitHub.  Also, it kept wanting to define functions where it should have just written the line, like pd.read_csv(“test.csv”).

Conclusion

As I said at the top, part of the fun is having an AI program write an AI program – although, in this case, any decent predictive model would suffice.  OpenAI is, itself, driven by a large language model (LLM).  So, here we have a large, general, neural network helping me to produce a small, tailored one.

What does all this mean for the industry?  Well, for one thing, it is starting to look bad for software developers.  Arjan suggests it will take out the junior engineers first, but I’m not so sure.

Researchers have long feared that the resources required to build and train foundation models would mean a Big Tech oligopoly.  Technologically, there have been good results in the other direction, with small open-source models.  Commercially, however, this is a race between Microsoft and Google.

Microsoft is also introducing other Copilots, and researchers are hard at work on natural language prompting for all computer tasks.  So, the same way I can prompt GPT-Engineer to write some code, you’ll be able to have an AI do whatever you were planning to do on Excel or Tableau.

Biweekly Payment Magic

A while back, I did some foundational work for a leading biweekly payment service.  That is, the math part, which I will reprise here.  Biweekly works best in a climate of high interest rates and, unfortunately, soon after this project, the Federal Reserve dropped their reference rate to zero.  The Fed has not been persistently above 2% until recently, and biweekly is once again looking good.

The featured chart shows a scenario first constructed by my erstwhile partner Phil Battista.  I call it the “magic trick” because the customer in this scenario has financed an extra $3,250 with no change to the term, APR, or payment.  Before presenting the trick, here are some basics about biweekly.

Biweekly Payment Plan Basics

In Canada, the banks offer loans with native biweekly payment schedules, and dealers feature them in their advertising.  Here in the States, you have to use a service.  The service collects payments biweekly via direct debit and manages the lender to accelerate the amortization.

Here is an example.  According to recent Cox data, the average price of a new car is now above $49,500 with an APR of 7.0% and a 72-month term.  By the way, this survey does not include luxury brands, and some people are financing up to 84 months.

Below, I have modeled this “average” loan showing monthly versus biweekly payment schedules.  This is showing the amortization only, omitting whatever fees the biweekly service may charge.  You can see that the loan is paid off seven months early.

If you’re using longer terms to fit customers into payments, biweekly will shorten the trade cycle a bit.  Also, credit-challenged buyers may be better off with direct debit synched to their paychecks.

Nostalgia Alert: coding for the U.S. Equity project was originally done in C# by my son, Paul, who would have been around fourteen at the time.  We were making an OO model to include all loan and lease instruments as subclasses.  Coding for this article was done by me, in Python, which is 10X easier.

The Magic Trick

If you compare the two charts above, you can see graphically how Phil’s trick works.  Instead of starting your biweekly loan at the same amount and having it end earlier, you start it higher and aim to end on the same date.

The trick works because half the monthly payment is higher than a native biweekly payment would be – by $33 in this example.  The customer makes the equivalent of thirteen monthly payments per year, and the bank loses a little bit of interest income.  Here are the steps:

  1. Increase the amount financed, which will increase the monthly payment.
  2. Increase the term until the monthly payment comes back down to where it was.
  3. Use the biweekly program to bring the term back down to where it was.

Congratulations, you can now finance more product with the same monthly payment.  I covered the concept for menu systems in Six Month Term Bump.  To do goal seeking, as I’ve shown here, you will need some Python (or a precocious teenager).

Moving to Powersports

Back in 2020, I contacted all the leading F&I administrators, pitching my plan for AI-priced service contracts.  As soon as the conversation touched on VIN decoding, they would invariably stop and ask me if I could get VIN data for powersports.  This turned out to be a trend.

Having been in automotive for many years, I was a little sniffy about powersports – although I had worked with Ducati, Harley, and RumbleOn during my tenure at Safe-Guard.  What I knew then was that powersports had only one DMS (Lightspeed), one menu system (Maxim), and no – there was no good VIN service.

When you’re in the powersports industry, you’re selling fun.

At $34 billion, powersports is dwarfed by the mighty auto market, but it has higher margins and better growth.  According to published financials, gross profit is around 20% for auto retail and 30% for powersports.  I expect that the 3% CAGR will perk up as the ecosystem improves, which is the topic of today’s post.

In automotive, we have a rich software ecosystem.  In powersports, not so much.  The ecosystem is complicated by a wide array of vehicles, from jet skis to snowmobiles, with the attendant challenges in standard process and vehicle ID.

The Powersports Market

There are roughly 17,000 car dealers in America, compared to 7,000 motorcycle dealers.  From a dealer’s perspective, powersports means less competition and higher margins, according to Mercer Capital – and it is terra nova for software vendors, as well.  Public auto group Sonic took Mercer’s advice, recently acquiring 13 powersports dealerships.

Here is another explainer, this one from SEMA, on the market structure of ATVs, UTVs, and motorcycles.  I am including it basically for this great quote from dealer consultant Rob Greenwald.  “When you’re in the powersports industry, you’re selling fun,” he said. “We sell lifestyle.”

Unlike buying a car, a powersports purchase is discretionary.  This means it’s more susceptible to economic downturns, but it’s also more fun.  People enjoy visiting the dealership, and that changes the technology model.

Digital retail, for example, is still important – but not to reduce time in the dealership.  It’s so that we don’t have to pull you out of that RZR to sign papers.

Crossover Software Vendors

A few of the website providers I wrote about are also active in powersports, like Dealer Inspire and Fox.  However, neither of these seems to have their digital retail solution in play.  One DR vendor that I recognize from auto is Joydrive, which made a strong entrance by partnering with Polaris and Octane.

Octane is the leading finance source in powersports, but there is a new entrant from the auto space, RouteOne founder Toyota Financial.  TFS is now the private label consumer and wholesale finance source for Bass Pro.

Another crossover vendor is Darwin which, after dominating the auto space, moved first into motorcycles – challenging Maxim’s lock on Harley-Davidson – and now into other powersports.  Speaking of menu selling, F&I providers here are Galt, Safe-Guard, and Protective.

Movement Toward Powersports

What I encountered in 2020 seems to have been a general movement toward powersports.  Lured by big groups like Bass Pro with its 170 locations, Marine Max (125), and RumbleOn (60), software vendors are extending into powersports.

There sure are a lot of motorcycles at this car show.

They will go where the dealers are and, as I walked the NADA show in Dallas, I had to smile at the untapped demand.  “Drop your business card and win this Harley,” offered one vendor.

“There sure are a lot of motorcycles at this car show,” I remarked.  And then there was the Kawasaki booth, enlisting car dealers looking to diversify – for fun and profit.