Learning GenAI via SOTA Papers

EP043: Weak Supervision Made OpenAI Whisper Robust


Listen Later

The paper "Robust Speech Recognition via Large-Scale Weak Supervision" introduces Whisper, a highly robust and versatile speech processing system developed by researchers at OpenAI.

Instead of relying on small, highly-curated datasets or purely unsupervised pre-training, Whisper is trained on 680,000 hours of weakly supervised, multilingual, and multitask audio data collected from the internet. By using a standard encoder-decoder Transformer architecture, a single Whisper model can handle a comprehensive pipeline of speech tasks, including English and multilingual speech recognition, any-to-English speech translation, spoken language identification, and voice activity detection.

The key takeaway from the paper is that scaling up weakly supervised pre-training allows the model to achieve highly effective zero-shot transfer to standard benchmarks without requiring any dataset-specific fine-tuning. Consequently, Whisper approaches human-level accuracy and demonstrates exceptional robustness to real-world noise and out-of-distribution datasets, significantly outperforming prior models that suffer from brittleness when tested outside of their specific training distributions like LibriSpeech.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu