Audio note: this article contains 182 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.
I (Subhash) am a Masters student in the Tegmark AI Safety Lab at MIT. I am interested in recruiting for full time roles this Spring - please reach out if you're interested in working together!
TLDR
This blog post accompanies the paper "Language Models Use Trigonometry to Do Addition." Key findings:
- We show that LLMs represent numbers on a helix
- This helix has circular features (sines and cosines) with periods _T = [2,5,10,100]_ and a linear component
- To solve _a+b_, we claim that LLMs manipulate the helix for _a_ and _b_ to create the helix for _a+b_ using a form of the Clock algorithm introduced by Nanda, et al.
- This is conceptually akin to going from _cos(a), cos(b)_ to _cos(a+b)_
- Intuitively, it's like adding angles on a [...]
---
Outline:
(00:30) TLDR
(01:28) Motivation and Problem Setting
(02:18) LLMs Represent Numbers on a Helix
(02:23) Investigating the Structure of Numbers
(02:48) Periodicity
(03:49) Linearity
(04:39) Parameterizing Numbers as a Helix
(05:39) Fitting a Helix
(07:12) Evaluating the Helical Fit
(08:57) Relation to the Linear Representation Hypothesis
(10:28) Is the helix the full story?
(12:15) LLMs Use the Clock Algorithm to Compute Addition
(14:23) Understanding MLPs
(16:24) Zooming in on Neurons
(17:07) Modeling Neuron Preactivations
(18:53) Understanding MLP Inputs
(20:34) Interpreting Model Errors
(24:11) Limitations
(25:49) Conclusion
The original text contained 9 footnotes which were omitted from this narration.
---