
Sign up to save your podcasts
Or
Rong Yan, CTO of HeyGen, joins SlatorPod to recount the company’s transformation from a Metaverse-focused startup to leading the emerging field of AI video generation.
Rong recounts HeyGen’s beginnings and the pivot to its current avatar model, which saw ARR go from zero to USD 1m within six months.
Rong attributes HeyGen’s success to its emphasis on three key elements: quality, consistency, and controllability. The company’s newest model, Avatar IV, enables full-body video generation with natural gestures, synchronized audio, and emotion to speech.
While some of the platform’s growth has been viral, Rong believes sustained success comes from building something users truly value, with a focus on pushing video quality from 70% to 95%.
The platform extends beyond avatars, offering translation, voice cloning, and real-time interactivity. Its dynamic duration feature adjusts translated speech to fit original video timing, preserving realism. Rather than build everything from scratch, HeyGen integrates external models with its own orchestration and user data, optimizing output across languages and contexts.
Rong emphasized that HeyGen’s long-term vision is not entertainment or Hollywood, but helping everyday professionals, especially marketers and educators, who lack traditional video production skills.
Looking ahead, Rong sees video agents, tools that generate complete videos from simple prompts, as the next frontier, driving accessibility and transforming storytelling through AI.
4.3
66 ratings
Rong Yan, CTO of HeyGen, joins SlatorPod to recount the company’s transformation from a Metaverse-focused startup to leading the emerging field of AI video generation.
Rong recounts HeyGen’s beginnings and the pivot to its current avatar model, which saw ARR go from zero to USD 1m within six months.
Rong attributes HeyGen’s success to its emphasis on three key elements: quality, consistency, and controllability. The company’s newest model, Avatar IV, enables full-body video generation with natural gestures, synchronized audio, and emotion to speech.
While some of the platform’s growth has been viral, Rong believes sustained success comes from building something users truly value, with a focus on pushing video quality from 70% to 95%.
The platform extends beyond avatars, offering translation, voice cloning, and real-time interactivity. Its dynamic duration feature adjusts translated speech to fit original video timing, preserving realism. Rather than build everything from scratch, HeyGen integrates external models with its own orchestration and user data, optimizing output across languages and contexts.
Rong emphasized that HeyGen’s long-term vision is not entertainment or Hollywood, but helping everyday professionals, especially marketers and educators, who lack traditional video production skills.
Looking ahead, Rong sees video agents, tools that generate complete videos from simple prompts, as the next frontier, driving accessibility and transforming storytelling through AI.
198 Listeners
2,125 Listeners
14 Listeners
86,116 Listeners
110,901 Listeners
3,970 Listeners
698 Listeners
302 Listeners
5,249 Listeners
8,772 Listeners
10 Listeners
0 Listeners
3,124 Listeners
46 Listeners
8 Listeners