
Sign up to save your podcasts
Or


Simplicio: Hey I’ve got an alignment research idea to run by you.
Me: … guess we’re doing this again.
Simplicio: Interpretability work on trained nets is hard, right? So instead of that, what if we pick an architecture and/or training objective to produce interpretable nets right from the get-go?
Me: If we had the textbook of the future on hand, then maybe. But in practice, you’re planning to use some particular architecture and/or objective which will not work.
Simplicio: That sounds like an empirical question! We can’t know whether it works until we try it. And I haven’t thought of any reason it would fail.
Me: Ok, let's get concrete here. What architecture and/or objective did you have in mind?
Simplicio: Decision trees! They’re highly interpretable, and my decision theory textbook says they’re fully general in principle. So let's just make a net tree-shaped, and train that! Or, if that's not quite general enough, we train a bunch of tree-shaped nets as “experts” and then mix them somehow.
Me: Turns out we’ve tried that one! It's called a random forest, it was all the rage back in the 2000's.
Simplicio: So we just go back to that?
Me: Alas [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongSimplicio: Hey I’ve got an alignment research idea to run by you.
Me: … guess we’re doing this again.
Simplicio: Interpretability work on trained nets is hard, right? So instead of that, what if we pick an architecture and/or training objective to produce interpretable nets right from the get-go?
Me: If we had the textbook of the future on hand, then maybe. But in practice, you’re planning to use some particular architecture and/or objective which will not work.
Simplicio: That sounds like an empirical question! We can’t know whether it works until we try it. And I haven’t thought of any reason it would fail.
Me: Ok, let's get concrete here. What architecture and/or objective did you have in mind?
Simplicio: Decision trees! They’re highly interpretable, and my decision theory textbook says they’re fully general in principle. So let's just make a net tree-shaped, and train that! Or, if that's not quite general enough, we train a bunch of tree-shaped nets as “experts” and then mix them somehow.
Me: Turns out we’ve tried that one! It's called a random forest, it was all the rage back in the 2000's.
Simplicio: So we just go back to that?
Me: Alas [...]
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

112,217 Listeners

131 Listeners

7,243 Listeners

558 Listeners

16,290 Listeners

4 Listeners

14 Listeners

2 Listeners