
Sign up to save your podcasts
Or
This post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023).
What is activation patching?
I introduce new terminology to clarify the distinction between different types of activation patching.
Node Patching
Node Patching (aka. “normal” activation patching) is when some activation in a neural network is altered from the value computed by the network to some other value. For example we could run two different prompts through a language model and replace the output of Attn 1 when the model is given some input 1 with the output of the head when the model is given some other input 2.
We will use the running example of a tiny, 1-layer transformer, but this approach generalizes to any transformer and any residual network.
All [...]
---
Outline:
(00:20) What is activation patching?
(00:30) Node Patching
(01:22) Edge Patching
(01:59) Path Patching
(03:21) Fast Edge Patching
(05:34) Performance Comparison
(06:19) Mask Gradients
(08:24) Appendix: Path Patching vs. Edge Patching
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
This post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023).
What is activation patching?
I introduce new terminology to clarify the distinction between different types of activation patching.
Node Patching
Node Patching (aka. “normal” activation patching) is when some activation in a neural network is altered from the value computed by the network to some other value. For example we could run two different prompts through a language model and replace the output of Attn 1 when the model is given some input 1 with the output of the head when the model is given some other input 2.
We will use the running example of a tiny, 1-layer transformer, but this approach generalizes to any transformer and any residual network.
All [...]
---
Outline:
(00:20) What is activation patching?
(00:30) Node Patching
(01:22) Edge Patching
(01:59) Path Patching
(03:21) Fast Edge Patching
(05:34) Performance Comparison
(06:19) Mask Gradients
(08:24) Appendix: Path Patching vs. Edge Patching
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners