Share Can the Tsinghua University AI Lab Prevent Model Collapse in Synthetic Data?

Copy link

December 24, 2024

Can the Tsinghua University AI Lab Prevent Model Collapse in Synthetic Data?

6 minutes

This episode analyzes the research paper titled "HOW TO SYNTHESIZE TEXT DATA WITHOUT MODEL COLLAPSE?" authored by Xuekai Zhu, Daixuan Cheng, Hengli Li, Kaiyan Zhang, Ermo Hua, Xingtai Lv, Ning Ding, Zhouhan Lin, Zilong Zheng, and Bowen Zhou, affiliated with institutions such as LUMIA Lab at Shanghai Jiao Tong University, the State Key Laboratory of General Artificial Intelligence at BIGAI, Tsinghua University, Peking University, and the Shanghai Artificial Intelligence Laboratory. Published on December 19, 2024, the discussion explores the critical issue of model collapse in language models trained on synthetic data. It examines the researchers' investigation into the negative impacts of synthetic data on model performance and the innovative solution of token-level editing to generate semi-synthetic data. The episode reviews the study's theoretical foundations and experimental results, highlighting the implications for enhancing the reliability and effectiveness of AI language systems.

This podcast is created with the assistance of AI, the producers and editors take every effort to ensure each episode is of the highest quality and accuracy.

For more information on content and research relating to this episode please see: https://arxiv.org/pdf/2412.14689

...more

View all episodes

By James Bentley

4.5

22 ratings

December 24, 2024

Can the Tsinghua University AI Lab Prevent Model Collapse in Synthetic Data?

6 minutes

...more

Sign up to save your podcasts