Tag: AI

What is “Real” AI?

Clients ask me this all the time.  They want to know if a proposed new system has the real stuff, or if it’s snake oil.  It’s a tough question, because the answer is complicated.  Even if I dictate some challenge questions, their discussion with the sales rep is likely to be inconclusive.

The bottom line is that we want to use historical data to make predictions.  Here are some things we might want to predict:

  • Is this customer going to buy a car today? (Yes/No)
  • Which protection product is he going to buy? (Choice)
  • What will be my loss ratio? (Number)

In Predictive Selling for F&I, I discussed some ways to predict product sales.  The classic example is to look at LTV and predict whether the customer will want GAP.  High LTV, more likely.  Low LTV, less likely.  With historical data and a little math, you can write a formula to determine the GAP-sale probability.

What is predictive analytics?

If you’re using statistics and one variable, that’s not AI, but it is a handy predictive model just the same.  What if you’re using a bunch of variables, as with linear regression?  Regression is powerful, but it is still an analytical method.

The technical meaning of analytical is that you can solve the problem directly using math, instead of another approach like iteration or heuristics.  Back when I was designing “payment rollback” for MenuVantage, I proved it was possible to algebraically reverse our payment formulas – possible, but not practical.  It made more sense to run the calculations forward, and use iteration to solve the problem.

You can do simple linear regression on a calculator.  In fact, they made us do this in business school.  If you don’t believe me – HP prints the formulas on the back of their HP-12 calculator.  So, while you can make a damned good predictive model using linear regression, it’s still not AI.  It’s predictive analytics.

By the way, “analytics” is a singular noun, like “physics.”  No one ever says “physics are fun.”  Take that, spellcheck!

What is machine learning?

The distinctive feature of AI is that the system generates a predictive model that is not reachable through analysis.  It will trundle through your historical data using iteration to determine, say, the factor weights in a neural network, or the split values in a decision tree.

“Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.”

The model improves with exposure to more data (and tuning) hence Machine Learning.  This is very powerful, and will serve for a working definition of “real” AI.

AI is an umbrella term that includes Machine Learning but also algorithms, like expert systems, that don’t learn from experience.  Analytics includes statistical methods that may make good predictions, but these also do not learn.  There is nothing wrong with these techniques.

Here are some challenge questions:

  • What does your model predict?
  • What variables does it use?
  • What is the predictive model?
  • How accurate is it?

A funny thing I learned reading forums like KD Nuggets is that kids today learn neural nets first, and then they learn about linear regression as the special case that can be solved analytically.

What is a neural network?

Yes, the theory is based on how neurons behave in the brain.  Image recognition, in particular, owes a lot to the dorsal pathway of the visual cortex.  Researchers take this very seriously, and continue to draw inspiration from the brain.  So, this is great if your client happens to be a neuroscientist.

My client is more likely to be a technology leader, so I will explain neural nets by analogy with linear regression.  Linear regression takes a bunch of “X” variables and establishes a linear relationship among them, to predict the value of a single dependent “Y” variable.  Schematically, that looks like this:

Now suppose that instead of one linear equation, you use regression to predict eight intermediate “Z” variables, and then feed those into another linear model that predicts the original “Y.” Every link in the network has a factor weight, just as in linear regression.

Apart from some finer points (like nonlinear activation functions) you can think of a neural net as a stack of interlaced regression models.

You may recall that linear regression works by using partial derivatives to find the minimum of an error function parametrized by the regression coefficients.  Well, that’s exactly what the neural network training process does!

What is deep learning?

This brings us to one final buzzword, Deep Learning.  The more layers in the stack, the smarter the neural net.  There’s no danger of overdoing it, because the model will learn to skip redundant layers.  The popular image recognition model, ResNet152 has – you guessed it – 152 layers.

So, it’s deep.  It also sounds cool, as if the model is learning “deeply” which, technically, I suppose it is.  This is not relevant for our purposes, so ignore it unless it affects accuracy.

Caution: Learning Curve Ahead

In last week’s episode, I warned that dealer groups proceeding aggressively into Digital Retail may suffer for it.  This has gotten some pushback.  Regular readers know that I have been a staunch proponent of Online F&I for many years.  Indeed, my work at PEN and F&I Express has done much to advance the cause. 

I gave this warning in the spirit of full disclosure, and to manage expectations.  Now I am in the awkward position of having to press my charge against a technology which I actually support.  If that sounds complicated, consider this:

Luddites – Veteran F&I Director Justin Gasman, quoted recently in Wards, says that F&I will never be totally digital.  “People who say that are from tech companies,” he quipped.  I call this the Luddite position but, in fairness, I am one of the tech guys he’s referring to.

Boosters – Cox Automotive regularly produces surveys with findings like: 63% of customers would be “more likely” to buy F&I products if they could learn about them online.  Coming from an opinion poll, this is mere boosterism. 

Realists – My position is somewhere between these extremes, hence the warning.  I was addressing the Big Six dealer groups, who are regularly ranked on F&I performance.  I do not want to be the consultant telling Mike Jackson to go all in, and then have to explain why he has slipped out of first place.

If you go to a dealer and say, “Hey, look, we’ve got this great solution, but the profitability is only half of what you had before,” that’s really going to slow down adoption.

Automotive News interviewed some realists last year, and they all share my cautious optimism.  The quote above is from Safe-Guard’s David Pryor.  The consensus goes something like this:

  1. Present F&I products online, early in the process, and include pricing.
  2. Use an API to select the right coverage, and AI to make recommendations.
  3. Experiment with (A/B test) various digital media.
  4. Integrate DR with your instore process, training, and metrics.

Roadster’s COVID-19 Dealer Impact Study found that dealers who already had Digital Retail saw improved gross, while the COVID adopters did not.  “Not a magic bullet,” it says, instead emphasizing the improved efficiency.  Other realists, as here, had the same experience.

Digital Retail is like any other new process.  There is risk, reward, and a learning curve.  That’s not too complicated.

DR and Public Dealer Groups

In today’s post, subtitled, “the good, the bad, and the ugly,” we look at where the Big Six public dealer groups stand on Digital Retail.  Some of them get it, some of them don’t, and others have missed the point.

“Once they start the process online, customers tend to buy a car at a much higher rate than … walking into our showroom” – Daryl Kenningham, Group 1

It’s not essential to spin up a distinct site, though many have taken this approach.  It’s a clever way to get in the same space as Carvana.  Thus, we have new brands Driveway, Clicklane, and Acceleride.  For example, you can enter Group 1’s DR process from either Acceleride or the Group 1 site. 

  • Penske – Penske started experimenting with DR way back in 2015 and something called Preferred Purchase.  Today, it’s still called Preferred Purchase, but it’s the DDC Accelerate system.
  • Group 1 – GP1 recently (2019) launched a Roadster implementation called Acceleride.  It is now selling more than 1,000 units per month, including new cars.  This is the top initiative in their investor deck, clearly showing management attention.
  • Asbury – Asbury was also an early adopter, starting with Drive (2016) and now their own Clicklane offering.  By my count, this is their third experiment – exactly what you want to see with digital transformation.
  • Lithia – Lithia has a branded DR site called Driveway which, unfortunately, requires users to create an account before entering the process.  As I wrote in Design Concepts for Online Car Buying, you don’t create an account until the customer is ready to save a deal.
  • AutoNation – AutoNation has made strategic investments in DR vendors like Vroom, and launched its own AutoNation Express in 2014.  As with Driveway, step one is a lead form.
  • Sonic – Sonic announced a plan to use Darwin but, alas, there is still no sign of DR on either the Sonic or EchoPark site.  Maybe the new eCommerce team will fix that. 

I can understand why new-car dealers might want to start with a lead form.  New cars are commodities, and vulnerable to price shopping.  This is where used-car dealers CarMax and Carvana have an advantage.  Otherwise, DR requires a strong commitment to price transparency.

Digital Retail is synergistic with modern sales practices, like one-touch and hybrid teams.  Sonic is the leader here, and has the highest used-car ratio, so you would expect them to have an edge.

Finally, it’s hard to sell protection products online.  Groups with growing DR penetration are likely to see reduced PVR.  This has long been a knock against Carvana.  Experts agree that the solution here is an AI-based “recommender.” 

AI-Based VSC Risk Rating

I have been working on a startup that will use artificial intelligence to rate vehicle service contracts.  For a VSC provider, increased accuracy means sharper pricing and, potentially, lower reserves.  Outsourcing this work to a specialist bureau means reduced costs, too.  Our business model is already used for risk rating consumer credit, and the technology is already used for risk rating auto insurance.

In this article, I present an example using auto insurance data.  If you would like to see how our approach works with VSC data, please get in touch.  We are currently seeking VSC providers for our pilot program.

The French MTPL dataset is often cited in the AI literature.  It gives the claims history for roughly 600,000 policies.  Of these, I used 90% for training and set aside 10% for testing.  So, the results shown here are not just “curve fitting,” but predictions against new data.

The Gini Coefficient

The challenge with insurance data is that most policies never have a claim.  This is known as the imbalanced data problem.  If you’re training an AI classifier, it can achieve 95% accuracy simply by predicting “no claim” every time.  You will want to use an objective function that heavily penalizes false negatives, and you may also want to oversample the “with claim” cases. 

The dashed line in the chart above represents cumulative actual claims, sorted in order of increasing severity.  This is called the Lorenz curve.  You can see that it’s zero all the way across and then, at the 95% mark, the claims kick in. 

The blue line is the Lorenz curve for the predictive model.  A good fit here would be a deep concavity that hugs the dotted line.  That would mean the model is estimating low where the actuals are low (zero) and then progressively steeper.

The Gini index is a measure of the Lorenz curve’s concavity.  This 0.30 is pretty good.  The team that won the Allstate Challenge did it with 0.20.  The downside to Gini is that it only tests the model’s ability to rank relative risks, not absolute ones.  I have seen models up above 0.40 that were still way off on actual dollars.

Mean Absolute Error

The key metric, to my way of thinking, is being able to predict the total claims liability.  This automatically gives you the mean, and Gini characterizes the distribution.  I like MAE because it represents actual dollars, and it’s not pulled astray by outliers (like mean squared error). 

Here, you see that the model overestimates by 1.2%.

You may be wondering why MAE is so high, when we are within $1.00 on the average claim.  That’s because all of the no-claim people were estimated at an average of $72.50, and they’re 95% of the test set.  The average estimate for the group that turned out to have claims (remember, this is out-of-sample data) was $130.70. 

Neural Networks

For claim severity, I trained a small neural net, including my own custom layers for scaling and encoding.  I really like TensorFlow for this, because it saves the trained encoders as part of the model.  You want to use a small neural net with a small dataset, because a bigger one can simply memorize the training data, and not be predictive at all.

This dataset has only nine features and, in fairness, a linear model would fit it just as well.  My code repo is now filled with neural nets, random forests, and two-stage combo models.  What this means for our startup is that we don’t have to hire a platoon of actuaries.  We can get by with a few data scientists using AI as a “force multiplier.”

Earlier this century, I played a key role in moving the industry to electronic origination.  At the time, it was clear that the API approach would liberate VSC pricing from the confines of printed rate cards and broad risk classes.  Each rate quote could be tailored to the individual vehicle.

As I said earlier, our approach is current, proved, and working elsewhere.  It’s just not being used in the VSC industry … yet.

Analytics for Menu Presentation

Last week, I presented a single-column format for menu selling on an iPhone, with the glib recommendation to let analytics determine the sort order.  Today, I will expand on that.  Our task is to sort the list of products in descending order of their relevance to the current deal, which includes vehicle data, consumer preferences, and financing terms.

This sorting task is the same whether we are flipping through web pages or scrolling down the mobile display.  The framework I present here is generalized and abstract, making the task better suited to automation, but ignoring the specific F&I knowledge we all take for granted.  I’ll come back to that later.

For now, let’s assume we have six products to present, called “Product One,” and so on, and four questions that will drive the sorting.  Assume these are the usual questions, like, “how long do you plan on keeping the car?”

That answer will be in months or years, and the next one might be in miles, but we are going to place them all on a common scale from zero to one (I warned you this would be abstract).  Think of using a slider control for each input, where the labels can be anything but the range is always 0.0 to 1.0.

Next, assign four weights to each product, representing how relevant each question is for that product.  The weights do not have to be zero to one, but I recommend keeping them all around the same starting magnitude, say 1 to 5.  Weights can also be negative.

For example, if there’s a question about loan-to-value, that’s important for GAP.  High LTV will correlate positively with GAP sales.  If you word that question the other way, the correlation will still be strong, but negative.  So, now you have a decision matrix that looks something like this:

Yes, we are doing weighted factor analysis.  Let’s say that, for a given deal, the answers to our four questions are, in order:

[0.3, 0.7, 0.1, 1.0]

To rank the products for this deal we simply multiply the decision matrix by the deal vector.  I have previously confessed my weak vector math skills, but Python has an elegant way to do this.

Product Two ranks first, because of its affinity for high-scoring Question Four.  Product Four takes second place, thanks to the customer’s response to Question Two – whatever that may be.  By now, you may have noticed that this is the setup for machine learning.

If you are blessed with “big data,” you can use it to train this system.  In a machine learning context, you may have hundreds of data points.  In addition to deal data and interview questions, you can use clickstream data, DMS data, contact history, driving patterns (?) and social media.

If not, you will have to use your F&I savvy to set the weights, and then adjust them every thousand deals by manually running the numbers.

For example, we ask “how long will you keep the car?” because we know when the OEM warranty expires.  Given make, model, and ten thousand training deals, an AI will dope out this relationship on its own.  We can  do it manually by setting one year past the warranty as 0.1, two as 0.2, etc.  We can also set a variable indicating how complete the manufacturer’s coverage is.

Same story with GAP.  Give the machine a loan amount and a selling price, and it will “discover” the correlation with GAP sales.  If setting the weights manually, set one for LTV and then calculate the ratio for each deal.

Lease-end protection, obviously, we only want to present on a lease deal.  But we don’t want it to crowd out, say, wearables.  So, weight it appropriately on the other factors, but give it big negative weights for cash and finance deals.

I hope this gives some clarity to the analytics approach.  In a consumer context, there is no F&I manager to carefully craft a presentation, so some kind of automation is required.

Cox Automotive Double Play

It is time to break out your game board once again and play “link the subsidiaries.”  I heard this one recently from a Cox person at a conference.  I don’t know if they have it in production yet, but it sure sounds good.

If you authorize vAuto to source new inventory as it sees fit, then it can connect to Manheim and automatically place the orders.  As soon as the gavel goes down, Dealer.com can pick up images and data from Manheim and immediately begin merchandising the vehicle.  Cox also owns the logistics company that hauls the vehicle, so they can report when it will arrive on the lot.

So, you could conceivably have a customer walk in to buy a vehicle that is arriving today, with the entire sourcing cycle untouched by human hands.  In fact, this sounds a little like what I described in Cox Automotive Home Game.  No mention (yet) of the COXML message format.

Update:  Details here from Mark O’Neil.  The chain goes: vAuto, Stockwave, Manheim, NextGear, and then Dealer.com.