January 10, 2022

Robust Self-Supervised Audio-Visual Speech Recognition

20 minutes

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe. Audio-visual speech recognition (AVSR) systems improve robustness by complementing the audio stream with the visual information that is invariant to noise and helps the model focus on the desired speaker.

2022: Bowen Shi, Wei-Ning Hsu, Abdelrahman Mohamed

https://arxiv.org/pdf/2201.01763v1.pdf

...more