Artificial Discourse

Whisper: Robust Speech Recognition via Large-Scale Weak Supervision


Listen Later

This research paper introduces Whisper, a speech recognition system trained on a massive, weakly supervised dataset of 680,000 hours of audio. The paper argues that scaling weakly supervised training has been underappreciated in speech recognition and that Whisper's robust, zero-shot performance demonstrates its ability to generalize well across different domains, languages, and tasks, even surpassing human accuracy in some areas. The authors explore the system's scaling properties, both in terms of model size and dataset size and analyze the impact of multitasking and multilingual training. They also discuss Whisper's performance on language identification and its robustness to noise. The paper concludes with a discussion of potential limitations and areas for future work.

...more
View all episodesView all episodes
Download on the App Store

Artificial DiscourseBy Kenpachi