Tom Carrington Smith's Podcast

New audio models from OpenAI, but how much can we rely on them?


Listen Later

Today's podcast covers:1. [New audio models from OpenAI, but how much can we rely on them?](https://simonw.substack.com/p/new-audio-models-from-openai-but?utm_source=substack&publication_id=1173386&post_id=159804781&utm_medium=email&utm_content=share&utm_campaign=email-share&triggerShare=true&isFreemail=true&r=1d51sg&triedRedirect=true)Welcome to Pocket to Podcast. Today we have one article covering the latest advancements in audio models by OpenAI and the implications of their use in various applications.OpenAI has recently announced a slew of new audio-related API features, significantly advancing the capabilities of both text-to-speech and speech-to-text technologies. Among these, the gpt-4o-mini-tts model emerges as a standout, offering "better steerability" and a selection of 11 base voices. This model allows users to apply specific instructions to modify voice output, including tone and delivery style, directly through OpenAI's new playground interface at OpenAI.fm. However, this flexibility introduces potential risks, as inserting stage directions within scripts could inadvertently lead to the model misinterpreting text as further instructions, a phenomenon observed during testing. The gpt-4o-mini-tts model is priced at $0.60 per million tokens, which OpenAI estimates to be around 1.5 cents per minute of s...Generated by Pocket to Podcast
...more
View all episodesView all episodes
Download on the App Store

Tom Carrington Smith's PodcastBy Tom Carrington Smith