Seventy3

【第178期】spurious forgetting:大模型的虚假遗忘


Listen Later

Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。

今天的主题是:Spurious Forgetting in Continual Learning of Language Models

Summary

This paper introduces the concept of spurious forgetting in large language models during continual learning, distinguishing it from actual knowledge loss and attributing it to the disruption of task alignment. The authors demonstrate through experiments and theoretical analysis that early training on new tasks can misalign the model, particularly in the bottom layers. To address this, they propose a Freezing strategy that keeps the initial layers unchanged, significantly improving performance in various continual learning scenarios like safety alignment and instruction tuning. Their findings highlight the importance of task alignment over pure knowledge retention and offer a practical method to mitigate performance degradation.

本文引入了大型语言模型在持续学习过程中出现的虚假遗忘概念,将其与实际的知识丧失区分开来,并将其归因于任务对齐的破坏。作者通过实验和理论分析表明,在新任务的早期训练阶段,模型(尤其是底层)可能会发生错位。为此,他们提出了一种冻结策略,即保持初始层不变,从而在安全对齐、指令微调等多种持续学习场景下显著提升模型性能。研究结果强调了任务对齐的重要性,相较于单纯的知识保留更为关键,并提供了一种实用的方法来缓解性能下降问题。

原文链接:https://arxiv.org/abs/2501.13453

...more
View all episodesView all episodes
Download on the App Store

Seventy3By 任雨山