
Sign up to save your podcasts
Or
This academic paper introduces WavReward, a novel evaluation system for end-to-end spoken dialogue models, which process speech input and output directly, unlike older systems that rely on text. Recognizing the limitations of existing evaluations that primarily focus on text, WavReward leverages audio language models to assess both the content and acoustic aspects of spoken interactions, including factors like emotion and tone. The authors also present ChatReward-30K, a new dataset containing speech-to-speech dialogues with human-assigned scores, designed specifically for training and testing spoken dialogue evaluators, thus addressing a significant gap in available resources for this field. Experiments show that WavReward, utilizing techniques like reinforcement learning and reasoning, surpasses current evaluation methods and aligns better with human judgment.
This academic paper introduces WavReward, a novel evaluation system for end-to-end spoken dialogue models, which process speech input and output directly, unlike older systems that rely on text. Recognizing the limitations of existing evaluations that primarily focus on text, WavReward leverages audio language models to assess both the content and acoustic aspects of spoken interactions, including factors like emotion and tone. The authors also present ChatReward-30K, a new dataset containing speech-to-speech dialogues with human-assigned scores, designed specifically for training and testing spoken dialogue evaluators, thus addressing a significant gap in available resources for this field. Experiments show that WavReward, utilizing techniques like reinforcement learning and reasoning, surpasses current evaluation methods and aligns better with human judgment.