February 22, 2024

[Linkpost] “Research Post: Tasks That Language Models Don’t Learn” by Bruce W. Lee

3 minutes

This is a linkpost for https://arxiv.org/abs/2402.11349Abstract.

We argue that there are certain properties of language that our current large language models (LLMs) don't learn. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. This benchmark highlights a fundamental gap between human linguistic comprehension, which naturally integrates sensory experiences, and the sensory-deprived processing capabilities of LLMs. In support of our hypothesis, 1. deliberate reasoning (Chain-of-Thought), 2. few-shot examples, or 3. stronger LLM from the same model family (LLaMA 2 13B -> LLaMA 2 70B) do not trivially bring improvements in H-Test performance.
Therefore, we make a particular connection to the philosophical case of Mary, who learns about the world in a sensory-deprived environment. Our experiments show that some of the strongest proprietary LLMs stay near random chance baseline accuracy of 50%, highlighting the limitations of knowledge acquired in the absence [...]

---

Outline:

(01:16) Key Findings on H-Test

(03:14) Acknowledgments and Links

---

First published:

February 22nd, 2024

Source:

https://www.lesswrong.com/posts/ia4HszGTidh74Nyxk/research-post-tasks-that-language-models-don-t-learn

Linkpost URL:
https://arxiv.org/abs/2402.11349

---

Narrated by TYPE III AUDIO.