Project Avatar

Everything in this video is AI generated. My voice and image have been cloned. Even the script was generated, by a Google product called Notebook LM. This post is mostly about Notebook LM, and I’ll also survey some other Gen AI tools.

Notebook LM is basically RAG in a box. If you don’t know what that is, you can read my earlier posts on the topic – or you can watch the video. I thought it would be clever to feed RAG articles to a RAG system, and have it generate a dialogue.

That’s right, Notebook LM will ingest raw source material, and then generate a podcast-style dialogue. The system is meant as a study aid, and you can imagine how powerful that is. Other outputs include a study guide, FAQ page, and timeline. Here is a sample entry from the War and Peace timeline:

October 1805: News arrives of Mack’s defeat. The Pavlograd hussars, including Rostov and Denisov, are stationed near Braunau. Rostov experiences his first taste of battle. He witnesses the horrors of war and feels disillusioned. Prince Andrew serves as an adjutant for Kutuzov.

One challenge with RAG has always been preparing the source materials. This earlier post described the work of parsing and vectorizing several text files. In real world applications, clean source material is hard to find. Notebook LM swallows PDF files with ease.

I was curious about the health concerns around seed oils, so I rounded up some papers from sources like the Journal of Nutrition and Metabolism, and just dumped them into Notebook LM. It prepared a handy summary of each one, plus the outputs listed above. I listened to the dialogue and, of course, you can chat with it, too.

  • Source Summaries
  • FAQ Page
  • Study Guide
  • Table of Contents
  • Timeline
  • Briefing Book
  • Chat Window
  • Dialogue

This is a practical, down-to-earth application of LLM technology. One person I found on Reddit is using Notebook LM to prepare for the CISSP exam. He’s doing what I did with seed oils, hoovering up all the InfoSec papers.

From Podcast to Video

Since the Notebook LM dialogue is audio only, I thought it would be fun to make a video and cast my own avatar for the male voice. That’s not even a real photograph of me. First, I trained a photo avatar on HeyGen, and then requested “Mark wearing dress shirt in library.”

Synthesia is similar to HeyGen, but it’s optimized for training videos. It uses a slideshow format. People like it because, if this is your application, all the tools are in one place. I found HeyGen to be more flexible for things like photo avatars and voice substitution.

Other tools I looked at were Deepbrain, now AI Studio, Wondershare, D-ID, and Creatify. Creatify is optimized for making product advertisements on social media. It can write its own script, based on reading the product’s website.

For my voice, I made an “instant voice clone” on Eleven Labs. I didn’t have the patience to make a “professional” one. The instant clone is good enough and, frankly, a little creepy.

I selected a canned avatar named Georgia to be my partner. Initially, I used the script from Notebook LM, and ran the HeyGen animation in text-to-speech mode. Georgia is native there, and HeyGen was able to use my voice via API from Eleven Labs. HeyGen also supports integration with LMNT, Play.ht, and Cartesia.

This is, by far, the easiest way to do it. When it was time to combine the two videos, I was able to use transcript-based editing in CapCut. Unfortunately, the result was a little bit robotic. The charm of Notebook LM’s dialogue is that it really sounds natural.

While one is speaking, the other sits patiently and makes facial expressions as if listening.

Working with the WAV file was more challenging. I used Audacity to split the male and female roles – not easy when they interject “uh-huh” over each other’s lines, but that’s the desired effect.

I left the female voice as-is, ran the male audio track through Eleven Labs to pick up my voice, and then went back to HeyGen – this time, uploading prefab audio for me and Georgia (separately) instead of scripts.

The result from HeyGen was two videos, one for each avatar. While one is speaking, the other sits patiently and makes facial expressions as if listening. The timing works because the split tracks from Audacity are in sync. The last thing to do was combine these, split-screen, in CapCut.

Gen AI and Social Media

My work with AI has always been machine learning for quantitative applications – Python, Scikit, and applied statistics – so it was fun to learn about the crazy things people are doing with Gen AI.

For instance, there is an AI generated influencer on Instagram. An Italian modeling agency created her, so the story goes, because they were tired of working with real prima donnas. 

There is now a cottage industry of avatars on social media, using tools like Creatify to monetize attribution. I thought for a moment about my custom GPT, Powerful Thinking, and its avatar, Bruno. But I couldn’t think of anything for him to sell. Next week, I’ll be back to my regular coding projects.