
Sign up to save your podcasts
Or
When you think you've found a circuit in a language model, how do you know if it does what you think it does? Typically, you ablate / resample the activations of the model in order to isolate the circuit. Then you measure if the model can still perform the task you're investigating.
We identify six ways in which ablation experiments often vary.[1][2]
How do these variations change the results of experiments that measure circuit faithfulness?
TL;DR
---
Outline:
(00:43) TL;DR
(01:29) Case Studies
(01:41) Indirect Object Identification Circuit
(06:28) Docstring Circuit
(09:01) Sports Players Circuit
(10:15) Methodology Should Match the Circuit
(10:45) Optimal Circuits are Defined by Ablation Methodology
(13:40) AutoCircuit
The original text contained 7 footnotes which were omitted from this narration.
The original text contained 2 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
When you think you've found a circuit in a language model, how do you know if it does what you think it does? Typically, you ablate / resample the activations of the model in order to isolate the circuit. Then you measure if the model can still perform the task you're investigating.
We identify six ways in which ablation experiments often vary.[1][2]
How do these variations change the results of experiments that measure circuit faithfulness?
TL;DR
---
Outline:
(00:43) TL;DR
(01:29) Case Studies
(01:41) Indirect Object Identification Circuit
(06:28) Docstring Circuit
(09:01) Sports Players Circuit
(10:15) Methodology Should Match the Circuit
(10:45) Optimal Circuits are Defined by Ablation Methodology
(13:40) AutoCircuit
The original text contained 7 footnotes which were omitted from this narration.
The original text contained 2 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,434 Listeners
2,388 Listeners
7,906 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners