Papers Read on AI

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers


Listen Later

Large pretrained language models have shown surprising In-Context Learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without additional parameter updates. Despite the great success in performance, the working mechanism of ICL still remains an open problem. In order to better understand how ICL works, this paper explains language models as meta-optimizers and understands ICL as a kind of implicit finetuning.
2022: Damai Dai, Yutao Sun, Li Dong, Y. Hao, Zhifang Sui, Furu Wei
https://arxiv.org/pdf/2212.10559v2.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings