GenAI – Virag Consulting

Here is a recent HBS study on the role of GenAI as a collaborator in a team work environment. What I liked most about the study is that it is field work – real-world tasks in a real company, Procter & Gamble (read more about field work in my review of Gary Klein’s book). It must have been a fun field trip for the Harvard kids. By the way, you may recognize Karim Lakhani as the author of Competing in the Age of AI.

GenAI’s ability to engage in natural language dialogue enables it to participate in the kind of open-ended, contextual interactions that characterize effective teamwork

The introduction recaps the literature on team work, and points to some testable hypotheses about using GenAI as a “cybernetic teammate.” They then proceed to a product development exercise using the company’s standard methods, with a large sample (n=776) of employees in randomly-assigned groups.

The image shows a chart for one of the outcomes, proposal “quality.” For quality, AI-augmented teams were more likely to produce proposals ranking in the top decile. This chart is a little scary, if you think about it, because the bump from adding AI is bigger (and cheaper) than the bump from adding more people.

In a nutshell, teams do better than individuals, but individuals using AI do better than teams. I see this on my LinkedIn feed all the time, and I can vouch for it myself. Shrewd founders see AI as a force multiplier, allowing them to go farther alone before needing to bring in partners.

The study also found that using AI produced proposals better balanced between marketing and technical orientation. Apparently, this is a big skills divide at P&G. Marketers will produce groovy ideas that aren’t feasible, and vice-versa for the tech people. Note the bimodal curve in Figure 11. So, the basic team needs at least one of each skill – unless you’re using AI. AI had the effect of bringing solutions more toward the middle ground.

Finally, test subjects self-evaluated for emotional bien-être, and discovered that working with AI was almost as satisfying as working with other people. So, if you can’t afford a marketing colleague for your lonely, overworked engineer, you can at least get him a cybernetic teammate.

Generative AI isn’t really my thing. To me, AI means machine learning for quantitative applications, like predictive analytics. Nonetheless, people seem to be having fun over here, so I thought I’d give it a try. Here are some notes from Project Avatar.

Tools in this space overlap a lot. Synthesia, for instance, aims to be one-stop shopping for training videos and the like. The video tools overlap with the image generation tools, while others, like Eleven Labs, have carved out niches where they’re best in class.

Create Your Avatar

With HeyGen, I was able to create custom avatars from photos, video, and generated images. It also features prefab avatars, based on their own actors. For a training video, you can choose from a wide variety of these. If you want to be differentiated, however, you will need to create a custom avatar.

I know of one company that simply adopted “Ada” from Synthesia, and made her the face of their application. Ada is a very popular avatar, so – no differentiation. With Synthesia, you can create custom avatars from video, but not photos.

Avatars from Still Images

Video avatars are generally more expressive, so why would you want to create one from still images? I can think of two reasons. First, making a good video avatar is a lot of work, and you need a model. I did one of myself, and it looks like hell.

Second, you might want to create a completely unique persona – to specifications – that you control. This is how I created Hadley, and here is where it gets interesting. Midjourney has a character reference flag so that, once you have the face you want, it can reliably reproduce her in different settings.

Close shot of a 52-year-old man, strong-looking, bald, square jaw with short beard, captured with a 70-200mm f/2.8E FL ED VR lens, with high-key lighting and a shallow depth of field.

Stability AI is arguably better at images but, without the cref feature, it is useless for this application. On the other hand, while Midjourney is great at creffing AI images, it won’t do photos. The image of me in Project Avatar is from HeyGen.

After training on a dozen recent photos, HeyGen can reproduce my likeness accurately about 25% of the time. These are called “looks.” So, “Mark seated in library wearing a dress shirt” is a look. I also did a video avatar, which included cloning my voice.

Synthetic Voice

Eleven Labs has a wide array of professional voices, plus you can clone your own voice. You can generate audio in Eleven Labs, and then upload it to HeyGen for animation. If you’re working from a script, you can paste the script into HeyGen and then link to voices in Eleven Labs using its API. HeyGen also has its own voice library.

Please write a short, motivational speech, roughly 75 words in length, on how to navigate life transitions. Include a narrative hook at the outset, and a punchy conclusion that recaps the hook.

Both systems are a little bit robotic when reading a script. There are some things you can do with the script to improve this. To get the best pacing and intonation, you might want to read the script yourself, and then use speech-to-speech conversion. This, unfortunately, will gum up your automation.

Automate Your Workflow

If you’re doing this at scale, you can’t be the one reading the scripts. This energetic bot is Jordan from LipDub. You can imagine the pace at which he is pumping out these ad spots. LipDub and Akool are both more expressive than HeyGen but, again, you need a live model.

All these tools are richly supplied with APIs so, if you are a non-coder, you can easily string together a workflow using Make (formerly Integromat). Here is my draft workflow for the Bruno channel:

Call ChatGPT to generate topic and prompt
Call Powerful Thinking to generate script
Pass script to Eleven Labs and call TTS to generate audio
Pass audio to HeyGen, selecting Bruno avatar by ID
Optional: generate new “look” for Bruno
Add captions
Post to Instagram reel using Meta API

By “richly,” I mean that HeyGen wants $100 per month for their API – on top of the monthly Creator subscription. CapCut doesn’t have an API, but JSON2Video does. Also, there are libraries in Python that will do captions. Bold captions add a nice touch.

Step two of the workflow calls my custom GPT, Powerful Thinking. This, unfortunately, does not yet have API support, so I was reduced to using Selenium.

Write the Script

Scripts are the easiest thing in the world to generate, and the LLMs (apart from Powerful Thinking) are equipped with APIs. You can give them specific instructions, and they’ll even do product research for you.

I am a ChatGPT guy, myself (the other camp is the Claude people) and I have also used Google’s Notebook LM. Hadley’s libertarian scripts, I write myself.

One last caveat: GenAI is a rapidly-evolving space. Synthesia had it to itself for a while, and now there is a raft of new entrants. Eleven Labs faces competition from Murf and Speechify. I am on a list for the LipDub beta. If you want to work in this space, you must be ready to learn a new tool every week.

Tag: GenAI

The Cybernetic Teammate

Notes on GenAI

Create Your Avatar

Avatars from Still Images

Synthetic Voice

Automate Your Workflow

Write the Script