August 07, 2025

What to expect in 2025 | Jack Piunti (GTM Lead for Communications at ElevenLabs)

26 minutes

In the Future of Voice AI series of interviews, I ask three questions to my guests:

- What problems do you currently see in Enterprise Voice AI?

- How does your company solve these problems?

- What solutions do you envision in the next 5 years?

This episode’s guest is Jack Piunti, GTM Lead for Communications at ElevenLabs.

Jack Piunti is the GTM lead for Communications at ElevenLabs, where he oversees go-to-market strategy across CPaaS, CCaaS, UCaaS, and customer experience. With a strong background in consultative technology partnerships and startup growth, Jack brings deep expertise in AI-driven communications. Prior to ElevenLabs, he spent six years at Twilio, helping shape enterprise adoption of real-time voice technologies. He is passionate about the future of connected applications and the role of AI in transforming how we communicate.

ElevenLabs is a voice AI company offering ultra-realistic text-to-speech, speech-to-text, voice cloning, multilingual dubbing, and conversational AI tools. Founded in 2022, it enables creators and developers to build voice apps and generate lifelike, emotionally rich speech in 70+ languages. Its latest models support expressive cues and multi-speaker dialogue.

Recap Video

Thanks for reading Voice AI Newsletter! Subscribe for free to receive weekly updates.

Takeaways

* Most AI failures in conversation don't come from the language model, but from inaccurate speech-to-text at the start.

* Bad transcription of critical details like names or codes breaks the entire user experience and can’t easily be recovered.

* Accurate speech-to-text is now a make-or-break factor for building reliable AI agents.

* Voice will soon replace typing as the main way humans interact with machines because it's more natural and efficient.

* Enterprises don’t want to stitch together multiple AI vendors, they want end-to-end platforms that simplify the stack and reduce latency.

* Demos often look impressive, but very few companies can scale real-time voice tech reliably in production environments.

* AI voice agents that sound expressive aren't enough — turn-taking and accuracy are still bigger challenges.

* Most companies ignore accessibility in AI, but modeling things like stuttering actually improves agent behavior.

* Streaming speech and voice models will unlock more lifelike, responsive AI agents — and it’s coming fast.

* Audio AI needs deep expertise beyond AI, including sound engineering and context-aware modeling of human speech.

* There’s a growing trend of AI companies going beyond voice to control the full audio experience, including music and sound effects.

* The way voice models are trained is fundamentally different from language models and requires much cleaner training data.

* Many agentic AI builders today are forced to cobble together solutions from different vendors, which creates delay and complexity.

* True real-time voice AI must handle language switching, emotional cues, and speech disfluencies automatically to feel natural.

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit voice-ai-newsletter.krisp.ai

...more

View all episodes

By Davit Baghdasaryan