Virag Consulting

Weighted Factors for Product Selection

Every so often, I will write up a standard quantitative procedure, usually because someone has asked me about it. For instance, see Pay Plan Math, What Is Accuracy, and Know Your Time Series. Today, it’s weighted-factor analysis for product selection. At a high level, this procedure is:

Gather your requirements and selection criteria
Quantify how important each criterion is
Grade the vendor responses
Compute numerical scores

Gather Requirements and Criteria

First, through interviews and maybe some direct observation, discover why we need the product. In my business, this is generally a software product, but it could be anything. Next, determine what are the requirements and selection criteria.

Selection criteria are the features we will evaluate to decide which product is the best fit, whereas requirements are features the product must have to even be considered. Don’t make the mistake of thinking requirements are just extra-special criteria.

If you’re looking to buy a car, and gas mileage is on the list, then a hybrid will score well on that criterion. If you’re only looking to buy a hybrid, then that’s the category, and you’re not looking at gas cars at all.

The purpose of requirements is to define the category of product we’re looking for. If you’re writing an RFP, the criteria are what the vendors respond to, and the requirements determine which vendors get the RFP. When in doubt, send them the RFP anyway and let the vendor figure it out.

For example, if I am selecting cybersecurity software, I might want endpoint protection (EPP), endpoint detection and response (EDR), managed detection and response (MDR), or even a security operations center (SOC). These all address the same problem, but they’re not the same product.

Quantify Importance of the Criteria

In the chart, I show criteria scored on a scale of 1 to 5, which is typical. Then, for the sake of example, I norm these to a total score of 100. This is probably overkill, but it’s fun to have 100 as a baseline. Later, we’ll do the same with the final score. Clients love simple numbers.

One way to explore the criteria is do a forced ranking from most important to least important. This is not amenable to quantitative methods, but it’s a good way to get started. Spend an hour in front of the whiteboard while the client staff fight it out over the ranking, then let them each do the 1 to 5, and average their responses.

Yet another way is to give each participant 100 points that he can allocate as desired across the criteria. This is the most accurate, in terms of understanding tradeoffs, and it makes the math easy.

I like to keep the cost analysis separate from the features. It is possible to turn the price proposal into another row among the criteria, but no one really thinks this way. What you’re shooting for is, “this one scored 84 out of a hundred, and it’s $100,000 more than the one that scored 74,” with traceability back to the features that account for the difference.

Grade the Vendor Responses

Maybe you’ve sent an RFP and are now grading the proposals, or maybe you’re doing your own research. Using an RFP is handy because you can include the criteria and let the vendors tell you how they propose to meet them. In either case, you (and the committee) are responsible for assigning a number to indicate how well the product meets each criterion.

Here again, the 1 to 5 scale is popular and easy to use. Obviously, grades supported by numbers are best. For gas mileage, you can assign 1, 2, 3, 4, and 5 to specific ranges of MPG. Something like “vendor support” can be tied to a service-level agreement in hours or minutes.

Compute the Final Score

This is called weighted-factor analysis because each product is scored according to its criteria grades, and the criteria have different weights. It’s just like computing a weighted average. Since we’ve normed the weights to 100 and we’re using a 5-point grading scale, we divide the totals by five to produce a score out of 100. You can present this as a percentage if you want.

In our carefully contrived example, vendor #3 comes out on top even though they had the lowest raw score, because they scored well on the criteria that mattered most.

When data scientists say that “our precision exceeds our accuracy,” this is what they mean. Do not take this fundamentally subjective numerical score out to two decimal places. The point of this procedure is not so much to generate a number, but to make the variables explicit.

The idea is that sum of many small decisions will be more accurate than one big one, particularly if there is consensus among the participants. Everyone on the committee should be able to say why the chosen product scored ten points better than the runner-up.

Also, to be a little bit pragmatic: now everyone has their fingerprints on the decision. No one can complain that they weren’t consulted, or question how the decision was made.

Funny aside: One of my first consulting projects was the selection of a networking vendor for Ford Credit. We did the full procedure: interviews, requirements, criteria, an RFP, a selection committee, bidder conferences, sealed bids, etc. Digital Equipment (DEC) won. Remember them? And then some big shot from the Glass House swooped in and gave the contract to IBM. What about our fancy RFP project? Well, it was “defective” because it failed to produce IBM as the winner. There was a saying in those days, “no one gets fired for buying IBM.” It was seen as the safe choice – and the only choice for risk-averse executives.

Sharpening the Saw

I was first acquainted with Continuing Professional Education (CPE) while working for a Big Six auditor in Detroit. Our CPAs were always worried about getting their CPE credits done on time. In those days, I was also working with some trade groups trying to create a certification program for computer professionals.

The Certified Data Processor (CDP) program never took off but, nonetheless, my profession became a hotbed of certifications for everything from tech support to Solutions Architect. Like other hiring managers, I take these “certs” seriously when recruiting.

My own tastes in CPE are rather eclectic. In this post, I’ll share some of my experiences so maybe if you’re new to this – or you’re an educator – it will give you some inspiration.

Superstar Lecturers on Video

I have written about Coursera before, so I won’t tell the whole MOOC story again, except to note that Coursera, Khan, and Udemy are now joined by edX, which is sponsored by Harvard and MIT so, you know – higher education will never be the same.

I have taken about a dozen courses through Coursera, including a “specialization” in deep neural networks. This one featured superstar AI educator Andrew Ng. As of this writing, Andrew is estimated to have taught eight million students.

This phenomenon is not new. Top professors often leave their colleges to go solo. I remember seeing Dr. Michael Hammer, inventor of Business Process Reengineering, lecture a packed conference hall in Washington, DC. As online education progresses, I predict the Pareto effect will set in, and schools will compete for a small number of superstar teachers.

It’s All About the Textbooks

Andrew’s classes were brilliant, but left precious little reference material. One lecture might have ten slides, and then you’re left to your own notes. So, concurrently with the Deep Learning course, I read Aurelien Geron’s book, Machine Learning with TensorFlow. In a way, it was good to have a book that was not the official textbook, because it gave me a different perspective.

I signed up for Stanford’s “Statistical Learning with Python,” strictly because of the textbook. It’s the classic from Hastie and Tibshirani (also superstar teachers) and it was just updated to use Python. I started going through the book on my own, and then discovered the class.

You can read a textbook on your own, but I don’t recommend it. I went through Barabasi’s book on Network Science, working exercises from each chapter in C# and Python. This was okay because I already had a background in graph theory. For the stats class, I often needed the lectures to clarify difficult points from the reading.

The lack of reference materials is especially frustrating if you take certification classes from vendors like Google and Amazon. You get a lot of advertising and a lot of video content, which is impossible to refer back to. Some of these classes are pretty chaotic, too, in terms of syllabus planning. Here again, I recommend buying a companion book.

Practical Coding Exercises

Coursera has a system where you can work Python notebooks inside their training environment. I think this is a clever way to keep their IP locked up, but I was always terrified it would time out and I’d have to start the exercise over again.

The edX class I’ve just completed simply leaves the code on GitHub. This is the most normal thing, from a developer’s perspective, and then students can use their favorite notebook. I copied all the labs into Google Drive and worked them in Google Collab.

Forums and Engagement

Discussion forums are a challenge. You’re never going to have the cohesion and topicality of a college course, because everyone is learning at their own pace. On the other hand, there are massive numbers of students encountering the material continuously, so this can be turned to advantage.

Stack Overflow is a popular online forum where you can find useful coding help – even if your question was last engaged ten years ago. In fact, many of Andrew’s coding assignments are discussed there. Over time, I believe an in-class forum could accumulate a critical mass of answers just like Stack Overflow.

On the other hand, expectations are different. If you’re paying for a class, and you’re stuck on the homework, you want an answer right now – especially if you suspect there’s a bug in one of the assignments. So, that’s down to the teaching assistant, or a staff of teaching assistants, or maybe ChatGPT.

Graded Tests and Certification

Since this is an ongoing hobby of mine, I looked into getting an online master’s degree, and decided against it for the reasons given above. If it’s just going to be slideware and a no-name teacher, I can do better on my own. What I’d really like to do is, go on picking best-in-class courses and somehow stitch them into a degree program.

Remember back in high school AP calculus when you learned about Lagrange multipliers? – Daniela Witten

Grading performance for online coursework is still an open problem. For professional certification, you take the class and then sit for an independently proctored exam. If you have the job experience, you may not need a class at all – and I think this hints at a solution.

The professional certification classes are not up to university standards, but people tolerate them because they need the cert. What the MOOCs could use is a more general (and recognized) system of testing. Instead of certifying that you can use a tool to do a job, like running Salesforce, there should be tests to show that you know something, like English Lit.

I am alluding, of course, to the College Board’s Advanced Placement (AP) program, which already offers college credit for passing their exams. This could become the certification regime for university aligned MOOCs like edX, and then they could organize degree programs around curricula supported by the exams.

Inventory Management for Powersports

One of the things I enjoyed about my sojourn in powersports was comparing practices with automotive retail. The most intriguing is a convergence of facts that suggest inventory management should be centralized:

Multiple new franchises in the same rooftop.
Many units arrive in a crate and require assembly.
Limited space, in the showroom and in the shop.
Inconsistent VIN decoding.

I’ll explain each of these, showing how powersports is different from automotive (and more like Dick’s) when it comes to inventory management. Then, I’ll briefly describe how such a setup might operate.

Powersports is Different from Automotive

Unlike an auto group, where stores are segregated by their OEM franchise, stores in a powersports group have much the same make/model mix. My local dealer sells Kawasaki, Polaris, Can-Am, and Yamaha – as they all do.

My favorite allocation model is not “you’ve sold all your Razors and Mavericks, so you get more.”

This means it’s possible to consolidate intake for the group, and allocate distribution based on real-time results. My favorite allocation model is not, “you’ve sold all your Razors and Mavericks, so you get more.” Stores must equally share the slow movers, too, so a bundled restock model is better.

Powersports stores are often small and crowded. They’re much more exciting than car stores, bursting with vehicles and accessories, with the colors and signage of multiple manufacturers. Keeping extra inventory offsite, in cheaper space, makes good sense.

Service is also space constrained, which means that building new units must often compete for bays (and techs) with repair and maintenance for customer vehicles. New build can be delegated to the warehouse, along with recon and custom build. Centralizing this work allows more efficient scheduling.

Centralizing intake also means cleaner model data for planning and analysis. In automotive, we take for granted that we always know the model and trim. In powersports, not so much. If you have multiple people receiving inventory in multiple stores, there can be a lot of variability.

Distribution Center Operation

Let’s follow some inventory through the distro, and highlight why this is a good idea. We start with new unit intake, where we have a central point to reconcile orders, schedule new unit build, and deal with freight damage. It’s also physically easier to handle freight trucks at a warehouse.

This is a central point to enter units into your store level DMS. If you’re running an enterprise inventory system, which I recommend, enter to it in parallel. Or, better yet, enter to it first and push to the DMS. The inventory system can prioritize build requests, track which stores are getting which units, and notify them.

Over in the shop, we are of course building new units but also accessorizing and building customized units, which may be from new or used inventory. There is good margin to be made here. The shop also centralizes recon work for trade units, which are backhauled from the delivery runs.

This is a control point for whether the recon pencils versus going to auction. Here again, the inventory system, operating “above” the store level DMS, helps route trade units back to their stores. It should interface with your logistics system.

An Expensive Proposition

So, it’s a good idea. On the other hand, let’s be honest about the costs:

Cost of renting and operating the warehouse.
Cost of running the logistics operation.
Opportunity cost of inventory sitting in the warehouse.

The cost benefit analysis comes down to how many stores are in the group, and how close they are to the warehouse. The rent can be offset against the cost of floorspace in a retail zone. Ditto for the opportunity cost, if this is inventory you were going to keep in the stores anyway. Also, we assume that the group is doing some kind of centralized order consolidation.

As for logistics, I’ve had good luck with Samsara. You probably have some trucks operating already, picking up used units, service units, or just redistributing inventory. Hell, if you want to go full digital retail, you can offer home delivery out of the warehouse – although this is not recommended. That’s another subtle way powersports is different.

Choosing the Cutoff Value

If you work with binary classifiers, then you are familiar with the problem of choosing a cutoff value. While the classifier will predict positives and negatives, under the covers it’s a probability score with an implicit 0.50 threshold. Since most real-life data is imbalanced, 0.50 will not be the right value.

This activity of finding the right cutoff value, and choosing the desired accuracy metric, can be a hassle, so I developed a tool to help me deal with it. In this article, I’ll show how I use the tool for a typical problem involving credit approval.

When training a binary classifier, we generally look at the “receiver operating characteristic” or ROC curve. This is a plot of true positives versus false positives for all choices of the cutoff value. A nice, plump ROC curve means that the model is fit for purpose, but you still have to choose the cutoff value.

In this example, we have an ROC with “area under the curve” of 0.76. This is a good score, but the default 0.50 threshold happens to lie where the curve runs into the lower left corner. Using the slider on my ROC tool, I can run this point up and down the curve, maximizing whichever accuracy metric I choose.

To do this, I have the classifier write a list of its predicted probability scores into a file, along with the actuals (y_pred, y_val) and then I read that file into the tool. If you’re using Scikit, you’ll want predict_proba for this.

In this case, the best balanced accuracy is achieved when the cutoff value is 0.11. We need balanced accuracy because our exploratory data analysis showed that the data is nine to one imbalanced in favor of negative cases.

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further and ask what is the gain or loss from each decision.

In the context of our credit problem, negatives are people who don’t default on their loan. Our classifier could present 90% naïve “accuracy” simply by calling every case a negative. We would confidently approve loans for everyone, and then encounter a 10% default rate.

The tool displays other popular accuracy metrics like precision, recall, and F1 score. By the way, notice that the true positive rate (TPR) and the false negative rate (FNR) add to unity because these are all the positive cases. The same goes for negatives. The TPR is also known as “sensitivity.”

For problems like this, balanced accuracy is usually sufficient, but we can take it a step further. We can ask what is the gain or loss from each decision. The tool accepts these figures, with the false ones marked as red.

For example, let’s say that a false negative costs us $10,000 in collections and recovery charges, while a true negative means we earn $7,500 in interest. True positives and false positives will both count as zero, because we declined them.

We can see that our maximum expected value of $2,170 is achieved when the cutoff value is reduced to 0.08. This is below the optimum for balanced accuracy. It is accuracy weighted more heavily to avoid false negatives.

I hope you enjoy using the tool. Remember, it’s best practice to do all this with your training or validation dataset, and then commit to a cutoff value for your final test.