October 04, 2024

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

10 minutes

This research paper introduces Whisper, a speech recognition system trained on a massive, weakly supervised dataset of 680,000 hours of audio. The paper argues that scaling weakly supervised training has been underappreciated in speech recognition and that Whisper's robust, zero-shot performance demonstrates its ability to generalize well across different domains, languages, and tasks, even surpassing human accuracy in some areas. The authors explore the system's scaling properties, both in terms of model size and dataset size and analyze the impact of multitasking and multilingual training. They also discuss Whisper's performance on language identification and its robustness to noise. The paper concludes with a discussion of potential limitations and areas for future work.

...more

View all episodes

By Kenpachi

October 04, 2024

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

10 minutes

...more

Share Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Sign up to save your podcasts

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision