Share New audio models from OpenAI, but how much can we rely on them?

Copy link

March 28, 2025

New audio models from OpenAI, but how much can we rely on them?

3 minutes

Today's podcast covers:1. [New audio models from OpenAI, but how much can we rely on them?](https://simonw.substack.com/p/new-audio-models-from-openai-but?utm_source=substack&publication_id=1173386&post_id=159804781&utm_medium=email&utm_content=share&utm_campaign=email-share&triggerShare=true&isFreemail=true&r=1d51sg&triedRedirect=true)Welcome to Pocket to Podcast. Today we have one article covering the latest advancements in audio models by OpenAI and the implications of their use in various applications.OpenAI has recently announced a slew of new audio-related API features, significantly advancing the capabilities of both text-to-speech and speech-to-text technologies. Among these, the gpt-4o-mini-tts model emerges as a standout, offering "better steerability" and a selection of 11 base voices. This model allows users to apply specific instructions to modify voice output, including tone and delivery style, directly through OpenAI's new playground interface at OpenAI.fm. However, this flexibility introduces potential risks, as inserting stage directions within scripts could inadvertently lead to the model misinterpreting text as further instructions, a phenomenon observed during testing. The gpt-4o-mini-tts model is priced at $0.60 per million tokens, which OpenAI estimates to be around 1.5 cents per minute of s...Generated by Pocket to Podcast

...more

View all episodes

By Tom Carrington Smith

March 28, 2025

New audio models from OpenAI, but how much can we rely on them?

3 minutes

...more

Sign up to save your podcasts