
Sign up to save your podcasts
Or
In The Future of Voice AI series of interviews, I ask three questions to my guests:
This episode’s guest is Scott Stephenson, Co-Founder & CEO at Deepgram.
Scott is a dark matter physicist turned Deep Learning entrepreneur. He earned a PhD in particle physics from University of Michigan where his research involved building a lab two miles underground to detect dark matter. Scott left his physics post-doc research position to found Deepgram.
Deepgram is one of the largest API companies offering Speech AI technologies such as Speech-to-Text, Audio Intelligence and the recently launched Text-to-Speech. Deepgram’s technology provides high accuracy and naturalness across multiple languages and accents. The major use cases include contact centers, conversational AI, media transcription, and speech analytics.
Recap Video
Takeaways
* Deepgram is building its own ASR models and this gives the ability to tune and scale the models
* Their infrastructure handles 100K real-time conversations (on average) at any moment of the day
* It’s easy to get an AI model to work but way, way harder to scale it with a 10x cheaper price
* The vast majority of Deepgram use cases are Speech to text. But Text to Speech is starting to take off as well
* When competing with large companies (Google, Amazon, MS, etc.), it’s important to realize that you are not really competing with the entire company but a small technical team who are generally less motivated than your startup
* Accuracy, speed and price are the top 3 problems in Speech-to-text
* Speech-to-Text prices have already decreased by 10x. Another 10x decrease is unlikely in the near future, at least not in the real-time use case.
* Faster AI inference chips will allow for larger and more accurate models with the same pricing
* Under 500ms latency is critical for Voice Bots’ use case
* Deepgram offers super low latency STT and super low latency TTS today
In The Future of Voice AI series of interviews, I ask three questions to my guests:
This episode’s guest is Scott Stephenson, Co-Founder & CEO at Deepgram.
Scott is a dark matter physicist turned Deep Learning entrepreneur. He earned a PhD in particle physics from University of Michigan where his research involved building a lab two miles underground to detect dark matter. Scott left his physics post-doc research position to found Deepgram.
Deepgram is one of the largest API companies offering Speech AI technologies such as Speech-to-Text, Audio Intelligence and the recently launched Text-to-Speech. Deepgram’s technology provides high accuracy and naturalness across multiple languages and accents. The major use cases include contact centers, conversational AI, media transcription, and speech analytics.
Recap Video
Takeaways
* Deepgram is building its own ASR models and this gives the ability to tune and scale the models
* Their infrastructure handles 100K real-time conversations (on average) at any moment of the day
* It’s easy to get an AI model to work but way, way harder to scale it with a 10x cheaper price
* The vast majority of Deepgram use cases are Speech to text. But Text to Speech is starting to take off as well
* When competing with large companies (Google, Amazon, MS, etc.), it’s important to realize that you are not really competing with the entire company but a small technical team who are generally less motivated than your startup
* Accuracy, speed and price are the top 3 problems in Speech-to-text
* Speech-to-Text prices have already decreased by 10x. Another 10x decrease is unlikely in the near future, at least not in the real-time use case.
* Faster AI inference chips will allow for larger and more accurate models with the same pricing
* Under 500ms latency is critical for Voice Bots’ use case
* Deepgram offers super low latency STT and super low latency TTS today