LLM – Virag Consulting

Choose the Right AI Tool

The AI landscape has changed a bit since I wrote What Is Real AI? back in 2021. The advent of GenAI has enabled a new wave of dubious AI sales pitches. Here’s one that crossed my desk recently:

We’ve identified some key GenAI opportunities at PermaPlate … forecast revenue and claims across [products] and adjust staffing monthly.

This sounds like a good idea, except – it’s not a GenAI application. It’s a standard forecasting exercise that everybody does already. If I did want to switch to a learning model – even a deep neural net, which is architecturally similar to an LLM – it would still not be GenAI.

The thing to remember is that GenAI “generates” things, like blog posts and deepfakes. My favorite learning models, going back to AI-Based Risk Rating, are all quantitative in nature. Here again, there are plenty of good statistical methods. Even if you prefer to use a learning model, you may not choose a neural net.

Neural nets have a problem with explainability. They’re basically a black box. That’s why credit bureaus, which must be able to explain their ratings, use a two-step approach. They use AI for exploration and feature engineering, but then they put the features into a more-transparent logistic regression model.

I might consider GenAI in a forecasting application, to deal with unstructured data. On the other hand, I would ask why the data is unstructured. We did this exercise as a POC here at PermaPlate. I wrote a little program that would read a service contract, and then answer natural-language queries.

Which coverage did the customer select and does it include roadside assistance?

It was a cool demo, but – if you want coverage details available for automation, it makes a lot more sense to store them in machine-readable form, at origination time. And what kind of automation might that be? Well, it might be “agentic.”

Agentic AI means that the AI has “agency,” in the sense that it can make decisions and do things in the world. Cool, huh? We give AI agency by equipping it with tools, in the form of software APIs.

Imagine asking ChatGPT to organize your next trip. It can’t, because it’s trapped inside your web browser. But if you invoke ChatGPT as part of an agentic workflow, with interfaces to the airlines and hotels, it can actually book the trip.

Agentic workflows often divide the work among tool-using LLMs, with a mastermind LLM directing the others. For systems that don’t have APIs, the agent can use Robotic Process Automation to operate the system’s user interface – just like you would at the keyboard. It’s not surprising that UIPath, one of the leading RPA vendors, has moved into Agentic AI.

Here is a short list of the latest AI techniques:

Large Language Model (LLM) – Like Grok and ChatGPT, these are AI models that can read and write (and plan, and execute).
Generative AI – Broad class of AI models that can create things, including LLMs but also diffusion models for video and other media.
Deep Neural Net (DNN) – Core technology behind GenAI, and many other learning models, as in my earlier article.
Retrieval Augmented Generation (RAG) – As the name implies, GenAI “augmented” by the ability to find and read your documents. See Unguided RAG for Text Comparison.
Robotic Process Automation (RPA) – Not AI, but frequently used by Agentic AI. As I wrote in Applied AI for Auto Finance, you can derive a lot of efficiency from RPA alone.
Agentic AI – AI agents that can make decisions and act autonomously.

Now that you know the lingo, you can choose the right tool for the job – or your AI sales pitch. I, for one, will not be using GenAI to predict claims volume … but I may use Agentic AI to dispatch the technicians.

The Cybernetic Teammate

Here is a recent HBS study on the role of GenAI as a collaborator in a team work environment. What I liked most about the study is that it is field work – real-world tasks in a real company, Procter & Gamble (read more about field work in my review of Gary Klein’s book). It must have been a fun field trip for the Harvard kids. By the way, you may recognize Karim Lakhani as the author of Competing in the Age of AI.

GenAI’s ability to engage in natural language dialogue enables it to participate in the kind of open-ended, contextual interactions that characterize effective teamwork

The introduction recaps the literature on team work, and points to some testable hypotheses about using GenAI as a “cybernetic teammate.” They then proceed to a product development exercise using the company’s standard methods, with a large sample (n=776) of employees in randomly-assigned groups.

The image shows a chart for one of the outcomes, proposal “quality.” For quality, AI-augmented teams were more likely to produce proposals ranking in the top decile. This chart is a little scary, if you think about it, because the bump from adding AI is bigger (and cheaper) than the bump from adding more people.

In a nutshell, teams do better than individuals, but individuals using AI do better than teams. I see this on my LinkedIn feed all the time, and I can vouch for it myself. Shrewd founders see AI as a force multiplier, allowing them to go farther alone before needing to bring in partners.

The study also found that using AI produced proposals better balanced between marketing and technical orientation. Apparently, this is a big skills divide at P&G. Marketers will produce groovy ideas that aren’t feasible, and vice-versa for the tech people. Note the bimodal curve in Figure 11. So, the basic team needs at least one of each skill – unless you’re using AI. AI had the effect of bringing solutions more toward the middle ground.

Finally, test subjects self-evaluated for emotional bien-être, and discovered that working with AI was almost as satisfying as working with other people. So, if you can’t afford a marketing colleague for your lonely, overworked engineer, you can at least get him a cybernetic teammate.

Unguided RAG for Text Comparison

Last week, we covered the basic RAG setup and had some fun answering questions from War and Peace. Today, we move on to the two-novel cases:

Answer questions using passages from two novels and compare them
Compare passages from two novels based on an unseen question
Compare passages from two novels based on a similarity search

When I first challenged ChatGPT to “compare common tropes and plot devices between War and Peace and Middlemarch,” I had a specific one in mind. Both novels are set in the early nineteenth century, a time when virtually all wealth was inherited – and the non-wealthy were basically slaves. So, everyone is trying either to marry into a rich family or suck up to a wealthy relative.

This trope is common enough that you could have seen it anywhere: our hero stands to inherit a fortune, but there is some intrigue around the dead uncle’s will. Maybe there’s a different version known only to the servants, etc. I put this question to RAG by gathering matching passages from both novels. The typical prompt template would be something like:

“Please answer this question {question} based on these passages from the novel War and Peace {context1} and these passages from the novel Middlemarch {context2}”

What we are looking for, though, is something a little more autonomous, so I simply removed the query text from the prompt template:

“Please compare these two novels with reference to the context provided … passages from the novel War and Peace {context1} and passages from the novel Middlemarch {context2}”

So, the retriever knows what question is to be answered, but ChatGPT is only asked to “compare” and draw its own conclusions. The results are quite good. I won’t share the whole response, but here is the concluding paragraph:

Overall, while both novels touch on similar themes of inheritance and wills, they do so in distinct ways that reflect the respective societies and characters depicted in each work. Middlemarch delves into the personal and familial implications of inheritance, with a focus on individual motivations and moral dilemmas, while War and Peace explores the broader societal and political consequences of inheritance, with a tone that is more ironic and comedic.

Here we see ChatGPT contrasting the tone of the two samples, and drawing a thematic inference. You can even fish for likely tropes, like “Are there instances of women being unfaithful?”

RAG with Unguided Retrieval

Finally, to make RAG fully autonomous, we must dispense with the guiding hand of the query text, and set it loose using only cosine similarity. This script simply trundles through both databases, using Chroma’s query by vector to find similar passages. There is no need to create any new embeddings.

Once a cross-novel match is found, the script retrieves n_results similar passages on each side, and then passes the unguided “compare” prompt to Open AI.

Between War and Peace and Middlemarch, it settles on some grim material about “empathy and compassion in a time of hardship.” I didn’t feel like quoting it so, instead, I tried another novel.

It took me about ten minutes to download, parse, and add Vanity Fair to the mix. In common with War and Peace, it has its own Napoleonic war (different campaign) and a great opportunity for guided search: “Is one or more of the protagonists killed in action?”

The unguided search script, predictably, finds the war. Tolstoy treats the war from a historical perspective while, for Thackeray, it’s just the backdrop for his drama.

On the other hand, War and Peace takes a broader perspective, examining the larger geopolitical forces at play during the Napoleonic Wars. The novel delves into the complexities of international relations, diplomacy, and military strategy, showing how the actions of monarchs, diplomats, and military leaders shape the course of history. While War and Peace also portrays the impact of war on individuals, it does so within the context of larger historical forces and political developments.

The histogram above shows the distribution of cosine similarity results across all 4.8 million pairs of chunks between Middlemarch and War and Peace for ada-002 and 3-small. These are both 1,536-dimensional embeddings. I also experimented with 3-large.

Initially, I preferred ada-002 because it was easier for the script to find similar passages. After working for a while with both, and seeing the histogram, I think maybe a wider variance is better. It means that nearby passages really are similar, while those that aren’t are farther apart.

For instance, 3-small gives a better answer on Natasha’s engagement because it’s more discriminating. Because I’ve read the novel (twice) I can infer where the search is going wrong. Also, I wrote a little utility function that displays which chunks it has found, with their distance metrics and metadata.