
Sign up to save your podcasts
Or


In week two of CS 153 ("AI Coachella"), Anjney Midha interviews Mati Staniszewski, founder and CEO of ElevenLabs, tracing the company’s origins from an early Discord text-to-speech bot to a fast-growing frontier audio and speech platform. Mati explains ElevenLabs’ initial focus on solving AI dubbing inspired by Poland’s single-voice film narration, the shift to prioritizing emotional, natural-sounding text-to-speech for creators, and the evolution from cascaded pipelines (transcription, translation/LLM, and speech generation) toward real-time voice agents. They discuss tradeoffs between cascaded versus fused multimodal systems, efforts to detect and convey emotion, safety and voice authentication limits, on-device model deployment, collaboration with teams like Sesame, and business lessons on PLG plus enterprise deployment, team structure, pricing from customer value, and growth to over $430M revenue with ~450 employees.
By Anjney MidhaIn week two of CS 153 ("AI Coachella"), Anjney Midha interviews Mati Staniszewski, founder and CEO of ElevenLabs, tracing the company’s origins from an early Discord text-to-speech bot to a fast-growing frontier audio and speech platform. Mati explains ElevenLabs’ initial focus on solving AI dubbing inspired by Poland’s single-voice film narration, the shift to prioritizing emotional, natural-sounding text-to-speech for creators, and the evolution from cascaded pipelines (transcription, translation/LLM, and speech generation) toward real-time voice agents. They discuss tradeoffs between cascaded versus fused multimodal systems, efforts to detect and convey emotion, safety and voice authentication limits, on-device model deployment, collaboration with teams like Sesame, and business lessons on PLG plus enterprise deployment, team structure, pricing from customer value, and growth to over $430M revenue with ~450 employees.