Artificial intelligence (AI) video generators and the avatars they create are evolving quickly. UK-based AI video startup Synthesia hopes to take the emerging technology to the next stage.
On Wednesday, the startup announced Expressive Avatars, which can depict a range of lifelike human emotions. Expressive Avatars are the latest edition of what the startup calls its "digital actors." They feature enhanced facial expressions, more accurate lip sync, and realistically human-like voices -- an upgrade from the robotic tone of most text-to-audio AI.
Also: Zoom gets its first major overhaul in 10 years, powered by generative AI
"This technology brings a level of sophistication and realism to digital avatars that blurs the line between the virtual and the real," the startup said in its announcement.
Synthesia's text-to-video platform comes with over 160 stock AI avatars, which the startup created based on paid human actors, with their consent. Teams can collaborate on videos from end to end and create videos in more than 130 languages.
The startup aims to replace the entire video production process with their software -- but it's not coming for Hollywood, CEO Victor Riparbelli said during a demonstration of the release. Instead, the startup focuses on enterprise and B2B content, where it sees a demand for easy-to-create, engaging, and human-like video.
Also: What is generative AI and why is it so popular? Here's everything you need to know
Synthesia's Expressive Avatars are powered by its Express-1 AI model. While the startup uses open-source LLMs for the text elements of the product, Synthesia trained Express-1 entirely on content produced in-house -- nothing synthetic or scraped from the web.
In the demo, Riparbelli explained that the startup hired thousands of actors to record videos for its Express-1 model in its London and New York studios, in part to avoid importing biases embedded in existing datasets.
"With this particular technology, it's not a viable strategy to go for synthetic content, because you essentially end up being able to replicate synthetic content, which is exactly what we're trying not to do with this," Riparbelli said. "You're trying to replicate how humans actually speak."
Riparbelli added that this relatively smaller dataset was enough for the Express-1 model because it is much more "narrow and specific" than models like Runway or OpenAI's Sora.
Also: Google's VLOGGER AI model can generate video avatars from images
The demo shows an avatar depicting three prompts: "I am happy", "I am upset", and "I am frustrated". The avatar speaks with a more realistic and natural rhythm than previous generations of Synthesia's tech.
"Expressive Avatars don't just mimic human speech; they understand its context," Synthesia said in its announcement. "Whether the conversation is cheerful or somber, our avatars adjust their performance accordingly, displaying a level of empathy and understanding that was once the sole domain of human actors."
While not indistinguishable from real people, the lifelike nature of these avatars can be alarming -- especially given how deepfake technology is abused.
"We are aware that Expressive Avatars are a powerful new technology, released during an important year for democracy, when billions of people around the world exercise their right to vote," the startup said in its announcement. "We've taken additional steps to prevent the misuse of our platform, including updating our policies to restrict the type of content people can make, investing in the early detection of bad faith actors, increasing the teams that work on AI safety, and experimenting with content credentials technologies such as C2PA."
Also: 80% of people think deepfakes will impact elections. Here are three ways you can prepare
Synthesia also had protections in place before Wednesday's release. Users can create custom avatars but must have the person's explicit consent and go through a "thorough KYC-like procedure", according to Synthesia's website. Plus, you can opt out of the process at any time (as can the stock actors), and Synthesia will erase your data and likeness. The startup doesn't allow users to make avatars of celebrities or politicians under any circumstances.
In addition, Riparbelli explains in a video that only vetted news organizations on enterprise plans can use Synthesia's tools to create news content. It's unclear what criteria Synthesia is using to determine what is a news organization, however, and whether the startup fact-checks content created by its platform.
Synthesia is part of the Content Authenticity Initiative, a coalition of companies and organizations working on tools for content provenance or for identifying the origins of a piece of media.
Also: What are Content Credentials? Here's why Adobe's new AI keeps this metadata front and center
Synthesia believes Expressive Avatars will help enterprises go beyond their basic content needs to create videos with a more empathetic touch: those about sensitive topics like health care, or customer support materials that emulate the friendliness and patience of a real person.
"This is only the first release, the first product, you can say, that we've built on top of these models," Riparbelli said during the demo. "I think we're looking at a magnitude shift in capabilities within the next six to nine months."