Virag Consulting

Centralized Inventory Management for Powersports

One of the things I enjoyed about my sojourn in powersports was comparing practices with automotive retail. The most intriguing is a convergence of facts that suggest inventory management should be centralized:

Multiple new franchises in the same rooftop.
Many units arrive in a crate and require assembly.
Limited space, in the showroom and in the shop.
Inconsistent VIN decoding.

I’ll explain each of these, showing how powersports is different from automotive (and more like Dick’s) when it comes to inventory management. Then, I’ll briefly describe how such a setup might operate.

Powersports is Different from Automotive

Unlike an auto group, where stores are segregated by their OEM franchise, stores in a powersports group have much the same make/model mix. My local dealer sells Kawasaki, Polaris, Can-Am, and Yamaha – as they all do.

My favorite allocation model is not “you’ve sold all your Razors and Mavericks, so you get more.”

This means it’s possible to consolidate intake for the group, and allocate distribution based on real-time results. My favorite allocation model is not, “you’ve sold all your Razors and Mavericks, so you get more.” Stores must equally share the slow movers, too, so a bundled restock model is better.

Powersports stores are often small and crowded. They’re much more exciting than car stores, bursting with vehicles and accessories, with the colors and signage of multiple manufacturers. Keeping extra inventory offsite, in cheaper space, makes good sense.

Service is also space constrained, which means that building new units must often compete for bays (and techs) with repair and maintenance for customer vehicles. New build can be delegated to the warehouse, along with recon and custom build. Centralizing this work allows more efficient scheduling.

Centralizing intake also means cleaner model data for planning and analysis. In automotive, we take for granted that we always know the model and trim. In powersports, not so much. If you have multiple people receiving inventory in multiple stores, there can be a lot of variability.

Distribution Center Operation

Let’s follow some inventory through the distro, and highlight why this is a good idea. We start with new unit intake, where we have a central point to reconcile orders, schedule new unit build, and deal with freight damage. It’s also physically easier to handle freight trucks at a warehouse.

If you’re running an enterprise inventory system, enter data to it first and push to the DMS.

This is a central point to enter units into your store level DMS. If you’re running an enterprise inventory system, which I recommend, enter to it in parallel. Or, better yet, enter to it first and push to the DMS. The inventory system can prioritize build requests, track which stores are getting which units, and notify them.

Over in the shop, we are of course building new units but also accessorizing and building customized units, which may be from new or used inventory. There is good margin to be made here. The shop also centralizes recon work for trade units, which are backhauled from the delivery runs.

This is a control point for whether the recon pencils versus going to auction. Here again, the inventory system, operating “above” the store level DMS, helps route trade units back to their stores. It should interface with your logistics system.

An Expensive Proposition

So, it’s a good idea. On the other hand, let’s be honest about the costs:

Cost of renting and operating the warehouse.
Cost of running the logistics operation.
Opportunity cost of inventory sitting in the warehouse.

The cost benefit analysis comes down to how many stores are in the group, and how close they are to the warehouse. The rent can be offset against the cost of floorspace in a retail zone. Ditto for the opportunity cost, if this is inventory you were going to keep in the stores anyway. Also, we assume that the group is doing some kind of centralized order consolidation.

As for logistics, I’ve had good luck with Samsara. You probably have some trucks operating already, picking up used units, service units, or just redistributing inventory. Hell, if you want to go full digital retail, you can offer home delivery out of the warehouse – although this is not recommended. That’s another subtle way powersports is different.

Choosing the Cutoff Value

If you work with binary classifiers, then you are familiar with the problem of choosing a cutoff value. While the classifier will predict positives and negatives, under the covers it’s a probability score with an implicit 0.50 threshold. Since most real-life data is imbalanced, 0.50 will not be the right value.

This activity of finding the right cutoff value, and choosing the desired accuracy metric, can be a hassle, so I developed a tool to help me deal with it. In this article, I’ll show how I use the tool for a typical problem involving credit approval.

When training a binary classifier, we generally look at the “receiver operating characteristic” or ROC curve. This is a plot of true positives versus false positives for all choices of the cutoff value. A nice, plump ROC curve means that the model is fit for purpose, but you still have to choose the cutoff value.

In this example, we have an ROC with “area under the curve” of 0.76. This is a good score, but the default 0.50 threshold happens to lie where the curve runs into the lower left corner. Using the slider on my ROC tool, I can run this point up and down the curve, maximizing whichever accuracy metric I choose.

To do this, I have the classifier write a list of its predicted probability scores into a file, along with the actuals (y_pred, y_val) and then I read that file into the tool. If you’re using Scikit, you’ll want predict_proba for this.

In this case, the best balanced accuracy is achieved when the cutoff value is 0.11. We need balanced accuracy because our exploratory data analysis showed that the data is nine to one imbalanced in favor of negative cases.

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further and ask what is the gain or loss from each decision.

In the context of our credit problem, negatives are people who don’t default on their loan. Our classifier could present 90% naïve “accuracy” simply by calling every case a negative. We would confidently approve loans for everyone, and then encounter a 10% default rate.

The tool displays other popular accuracy metrics like precision, recall, and F1 score. By the way, notice that the true positive rate (TPR) and the false negative rate (FNR) add to unity because these are all the positive cases. The same goes for negatives. The TPR is also known as “sensitivity.”

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further. We can ask what is the gain or loss from each decision. The tool accepts these figures, with the false ones marked as red.

For example, let’s say that a false negative costs us $10,000 in collections and recovery charges, while a true negative means we earn $7,500 in interest. True positives and false positives will both count as zero, because we declined them.

We can see that our maximum expected value of $2,170 is achieved when the cutoff value is reduced to 0.08. This is below the optimum for balanced accuracy. It is accuracy weighted more heavily to avoid false negatives.

I hope you enjoy using the tool. Remember, it’s best practice to do all this with your training or validation dataset, and then commit to a cutoff value for your final test.

Applied AI for Auto Finance

Maybe it’s my “availability bias,” but AI seemed to be the theme of last week’s Auto Finance Summit. There was one dedicated session while others, like residual risk and subprime credit, had AI in the background.

The exhibit hall featured the usual AI-based businesses: Upstart, Zest, SAS, et al. In today’s post, I’ll summarize what I learned from the conference.

One panelist framed AI as “the same thing we’ve done with credit scores for the last thirty years.” While technically incorrect – no one would describe a scorecard as AI – this framing has merit. As with credit scoring, any AI model must be monitored for “drift,” and continually retrained.

Welcome to ML Ops

This brings us to MLOps or, as Experian calls it, Model Ops – and it’s not easy. Experian reports that 55% of models never make it into production. Their survey, which I have been excerpting on my Twitter feed, is filled with stats like this.

MLOps is like Dev Ops, only you have to version the data as well, and the code is guaranteed to rust.

Here is how I described MLOps to an engineering manager: Think of the work your team does to control code and manage a pipeline. MLOps is like that, only you have to version the data as well, and the code is guaranteed to rust.

There are good no-code AI tools, like Google AutoML, which I wrote about here, SageMaker, and SAS Viya. As an old-school Python coder, I was gratified to see that these tools are in the minority.

Not Just for Credit Scoring Anymore

The “same as credit scores” framing is instructive in another way, too. This is where lenders first learned the power of predictive analytics, and began to build a capability that now includes targeted marketing, fraud detection, behavioral scoring, and more.

While generative AI is an exciting and rapidly advancing technology, the other applications of AI … continue to account for the majority of the overall potential value – McKinsey

There was a “gotcha” moment in one session, where the panelists had to admit they’re not doing anything with Generative AI. But, why would they? There are at least a dozen low-hanging use cases for AI classifiers and regressors, as I mentioned in What Is Real AI?

The AI/ML Development Journey

The most practical advice came from CUDL President Brian Hamilton. Brian reminded the audience not to overlook Robotic Process Automation. This was the only reference to RPA that I heard, and it got me thinking of the broader AI context. A typical journey might look something like this:

Data Wrangling
Predictive Analytics
Process Automation
Data Engineering
Machine Learning
Model Ops
Generative AI

Gen AI might appear in an automation role, as chatbot or search engine, but – certainly for financial services, which is swimming in metrics – the early use cases will be predictive.

While McKinsey estimates the impact of Gen AI around $3 trillion per year, this is on top of $11 trillion for non-gen AI and analytics. The McKinsey study is here, and the Microsoft maturity framework is here.

Conferences like this are a great way to see what other people are doing with AI but, in the end, you must decide how best to deploy it toward your own business needs. It’s a vehicle, not a destination.

Predicting Loan Defaults with AI

I have some time on my hands, so I decided to experiment with some of the new AI assisted code generators. I wanted something relevant to F&I, and I found this exercise on Coursera. The “F” side of F&I gets all the attention, but there is plenty of opportunity for AI to rate insurance risk and mechanical breakdown risk.

Note that we are using AI to generate an AI model. For the Coursera exercise, linear regression is sufficient, but I chose to use neural networks here because they are undeniably machine learning. See my earlier post on this, “What Is Real AI?”

Today, we’ll look at three popular AI assistants: ChatGPT, GPT-Engineer, and GitHub Copilot. These are all based on the famous OpenAI large language model, just packaged a little differently.

To start, I worked the problem myself, running several different models. The goal is to predict the probability of a given loan going bad, based on seventeen variables including credit score, term, and debt to income. Once I was satisfied with my solution, I turned the problem over to my robot friend, ChatGPT.

ChatGPT

Using the chat window requires you to cut and paste code over to your IDE, so it doesn’t really feel like a code generator. On the other hand, it’s conversational, so it can tell you its assumptions and you can give clarifying instructions. Here is the prompt I used:

We need to write a Python script to predict loan defaults using a neural network model, and based on some input data. To start, read the input data from a CSV file and create a data frame. Some of the columns have numeric data and some have categorical data. The last column is the dependent variable. It has values of either zero or one. Next, prepare the data for use in a neural network by running it through an appropriate pipeline. Split off twenty percent of the rows, randomly, for use as a test set. Finally, train the neural network using the remaining eighty percent of the rows. We want to know the probability of a loan default. Test the neural network by comparing its predictions for the test set, and report your results by plotting the ROC curve.

Ordinarily, this would be more interactive, but I wanted to write all the requirements into a single prompt for demonstration purposes, and because that’s the approach preferred by GPT-Engineer.

ChatGPT solved the problem nicely, matching my best score of 0.76. It plotted its own ROC chart, below, instead of using the Scikit method, even though it did use Scikit for the pipelines.

ChatGPT did several things differently from me, mostly simplifications, and it made some mistakes. For example, it forgot that its “create neural network” function didn’t include compilation. The call to compile() can be in or out of the function, so it’s easy to misplace.

The creepiest thing ever? ChatGPT chose random_state = 42, which is completely arbitrary – and the same number I use! Like, how did it know?

GPT-Engineer

GPT-Engineer uses the OpenAI API, for which you have to buy tokens. I think of it as a wrapper for the ChatGPT dialogue above. GPT-Engineer also runs outside of the IDE. You invoke it from the command line and pass a file containing the prompt.

The special charm of GPT-Engineer is that it can ask clarifying questions. Here are the questions it had about my prompt:

The specific format and structure of the input CSV file.
The exact columns that contain numeric and categorical data.
The type of neural network model to be used.
The specific pipeline to be used for preparing the data.
The method for splitting the data into training and test sets.
The method for training the neural network.
The method for testing the neural network and comparing its predictions.
The method for reporting results and plotting the ROC curve.

Some of these questions are reasonable and some are not. ChatGPT, on its own, figured out how to handle the file, the pipeline, and the structure of the neural net. In fact, it was kind of fun comparing notes with ChatGPT like, “oh yeah, well, I’m using batch norms.”

I suspect that “code and train a neural net” is asking too much. GPT-Engineer crushes routine tasks, as my man Arjan demonstrates here.

GPT Engineer is another nail in the coffin of software developers. In this video, I’ll show you how it works. The tool is crazy powerful. Just specify what you want it to build, and then, well, it just builds it.

GitHub Copilot

Microsoft did a nice job of integrating Copilot into Visual Studio and several other IDEs. You install an extension, and subscribe to the service for $10 a month. Microsoft has a big ($13 billion) investment in OpenAI, and they own GitHub. This means an LLM trained not only on human language, but on a giant repository of source code.

Microsoft advertises Copilot as “pair programming,” like having a buddy look over your shoulder, and it works the same way autocomplete works for text. It can also define a function based on an inline comment like, “read a file from disk and create a dataframe.”

Copilot didn’t really suit me. I wanted to see how an AI would code differently from me, as ChatGPT had, but Copilot kept serving up my own code from GitHub. Also, it kept wanting to define functions where it should have just written the line, like pd.read_csv(“test.csv”).

Conclusion

As I said at the top, part of the fun is having an AI program write an AI program – although, in this case, any decent predictive model would suffice. OpenAI is, itself, driven by a large language model (LLM). So, here we have a large, general, neural network helping me to produce a small, tailored one.

What does all this mean for the industry? Well, for one thing, it is starting to look bad for software developers. Arjan suggests it will take out the junior engineers first, but I’m not so sure.

Researchers have long feared that the resources required to build and train foundation models would mean a Big Tech oligopoly. Technologically, there have been good results in the other direction, with small open-source models. Commercially, however, this is a race between Microsoft and Google.

Microsoft is also introducing other Copilots, and researchers are hard at work on natural language prompting for all computer tasks. So, the same way I can prompt GPT-Engineer to write some code, you’ll be able to have an AI do whatever you were planning to do on Excel or Tableau.