
Sign up to save your podcasts
Or


In this episode of AI + a16z, Sesame Cofounder and CTO Ankit Kumar joins a16z general partner Anjney Midha for a deep dive into the research and engineering behind their voice technology. They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions.
They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction.
Plus, we take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.
Key Takeaways:
How Sesame AI achieves natural voice interactions through real-time speech generation.
For anyone interested in AI and voice technology, this episode offers an in-depth look at the latest advancements pushing the boundaries of human-computer interaction.
Learn more:
The Maya + Miles demo
Crossing the uncanny valley of conversational voice
Sesame CSM 1B model
Follow everybody on X:
Ankit Kumar
Anjney Midha
Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
By a16z4.6
2929 ratings
In this episode of AI + a16z, Sesame Cofounder and CTO Ankit Kumar joins a16z general partner Anjney Midha for a deep dive into the research and engineering behind their voice technology. They discuss the technical challenges of real-time speech generation, the trade-offs in balancing personality with efficiency, and why the team is open-sourcing key components of their model. Ankit breaks down the complexities of multimodal AI, full-duplex conversation modeling, and the computational optimizations that enable low-latency interactions.
They also explore the evolution of natural language as a user interface and its potential to redefine human-computer interaction.
Plus, we take audience questions on everything from scaling laws in speech synthesis to the role of in-context learning in making AI voices more expressive.
Key Takeaways:
How Sesame AI achieves natural voice interactions through real-time speech generation.
For anyone interested in AI and voice technology, this episode offers an in-depth look at the latest advancements pushing the boundaries of human-computer interaction.
Learn more:
The Maya + Miles demo
Crossing the uncanny valley of conversational voice
Sesame CSM 1B model
Follow everybody on X:
Ankit Kumar
Anjney Midha
Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.
Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

1,288 Listeners

538 Listeners

175 Listeners

1,087 Listeners

333 Listeners

226 Listeners

211 Listeners

501 Listeners

148 Listeners

60 Listeners

131 Listeners

141 Listeners

21 Listeners

39 Listeners

44 Listeners