Synthesia, a startup using AI to create synthetic videos of avatars for marketing, today announced it has raised $12.5 million. In a press release, the company said the funding will be put toward expanding its workforce as it invests in product R&D.
As the pandemic makes virtual meetups a regular occurrence, the concept of “personal AI” is rapidly gaining steam. Startups creating virtual beings, or artificial people powered by AI, have collectively raised more than $320 million in venture capital to date. As my colleague Dean Takahashi points out, these beings are a kind of precursor to the metaverse, a universe of virtual worlds that are all interconnected, as in novels such as Snow Crash and Ready Player One.
Synthesia’s immediate goals are less ambitious. Like rivals Soul Machines, Brud, Wave, Samsung-backed STAR Labs, and others, the company employs a combination of machine learning techniques to create visual chatbots, product videos, and sales videos for clients without actors, film crews, studios, or cameras.
Reducing costs with AI avatars
“We’ve still only scratched the surface of the video economy. In 10 years, we believe most of our digital experiences will be powered by video in some way or form,” CEO Victor Riparbelli told VentureBeat. Riparbelli cofounded Synthesia in 2017 alongside Steffen Tjerrild and computer vision professors Lourdes Agapito and Matthias Niessner, who is behind some of the better-recognized research projects in the field of synthetic media, such as Deep Video Portraits and Face2Face.
“Today, video production is costly, complex, and unscalable. It requires studios, actors, cameras, and post-production. It’s an incredibly long and multidisciplinary process, rooted in physical space and sensors,” Riparbelli continued. “To truly realize the video-first internet, we need a more scalable and accessible way to make video.”
Synthesia customers choose from a gallery of in-house, AI-generated presenters or create their own by recording voice clips and then uploading them. After typing or pasting in a video script, Synthesia generates a video “in minutes,” making it available for translation into dozens of languages.
As pandemic restrictions make conventional filming tricky and risky, the benefits of AI-generated video have been magnified. According to Dogtown Media, an education campaign under normal circumstances might require as many as 20 different scripts to address a business’ worldwide workforce, with each video costing tens of thousands of dollars. Synthesia’s technology can pare the expenses down to a lump sum of around $100,000.
Synthesia says that client CraftWW used its platform to ideate an advertising campaign for JustEat in the Australian market featuring an AI-manipulated Snoop Dogg. The company also worked with director Ridley Scott’s production studio to create a film for the nonprofit Malaria Must Die, which translated David Beckham’s voice into over nine languages. And it partnered with Reuters to develop a prototype for automated video sport reports.
“We’re building an application layer that turns code into video, allowing for video content to be programmed with computers rather than recorded with cameras and microphones. Once video production is abstracted away as code, it has all the benefits of software: infinite scale, close to zero marginal costs, and it can be made accessible to everyone,” Riparbelli said. “This is now quickly becoming a reality. We launched our software-as-a-service product just six months ago … [and we] have essentially reduced the entire video production process to a single API call or a few clicks in our web app.”
In the near future, Synthesia plans to make generally available a product that personalizes videos to specific customer segments. It’s called Personalize, and Synthesia says it can automatically translate videos featuring actors or staff members into over 40 languages.
“We have been overwhelmed by the response in the last six months since our beta launch: We now have thousands of users, and our customers range from small agencies to Fortune 500 companies,” Riparbelli said. “They use Synthesa primarily for internal training and corporate communications. But now we are seeing more and more companies starting to use it for external communications, incorporating personalized video into every step of the customer journey through our personalized video API.”
Some experts have expressed concern that tools like Synthesia’s could be used to create deepfakes, or AI-generated videos that take a person in an existing video and replace them with someone else’s likeness. The fear is that these fakes might be used to do things like sway opinion during an election or implicate a person in a crime. Deepfakes have already been abused to generate pornographic material of actors and defraud a major energy producer.
For its part, Synthesia has posted ethics rules online and says it vets its customers and their scripts. It also requires formal consent from a person before it will synthesize their appearance and refuses to touch political content.
“We are trying to solve a very complex and technical problem,” Riparbelli recently told the Telegraph. “We are not releasing any software to the public … There is a wider discussion to be had about the malevolent use of this kind of stuff.”
Synthesia’s series A funding round announced today was led by FirstMark Capital, with participation from Christian Bach; Michael Buckley; and existing investors, including Mark Cuban. The London, U.K.-based company has 30 employees, and its total raised is now over $16.6 million.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more