
Sign up to save your podcasts
Or


This post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023).
What is activation patching?
I introduce new terminology to clarify the distinction between different types of activation patching.
Node Patching
Node Patching (aka. “normal” activation patching) is when some activation in a neural network is altered from the value computed by the network to some other value. For example we could run two different prompts through a language model and replace the output of Attn 1 when the model is given some input 1 with the output of the head when the model is given some other input 2.
We will use the running example of a tiny, 1-layer transformer, but this approach generalizes to any transformer and any residual network.
All [...]
---
Outline:
(00:20) What is activation patching?
(00:30) Node Patching
(01:22) Edge Patching
(01:59) Path Patching
(03:21) Fast Edge Patching
(05:34) Performance Comparison
(06:19) Mask Gradients
(08:24) Appendix: Path Patching vs. Edge Patching
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
By LessWrongThis post outlines an efficient implementation of Edge Patching that massively outperforms common hook-based implementations. This implementation is available to use in my new library, AutoCircuit, and was first introduced by Li et al. (2023).
What is activation patching?
I introduce new terminology to clarify the distinction between different types of activation patching.
Node Patching
Node Patching (aka. “normal” activation patching) is when some activation in a neural network is altered from the value computed by the network to some other value. For example we could run two different prompts through a language model and replace the output of Attn 1 when the model is given some input 1 with the output of the head when the model is given some other input 2.
We will use the running example of a tiny, 1-layer transformer, but this approach generalizes to any transformer and any residual network.
All [...]
---
Outline:
(00:20) What is activation patching?
(00:30) Node Patching
(01:22) Edge Patching
(01:59) Path Patching
(03:21) Fast Edge Patching
(05:34) Performance Comparison
(06:19) Mask Gradients
(08:24) Appendix: Path Patching vs. Edge Patching
The original text contained 7 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.

112,952 Listeners

130 Listeners

7,230 Listeners

535 Listeners

16,199 Listeners

4 Listeners

14 Listeners

2 Listeners