
Sign up to save your podcasts
Or


Executive Summary
---
Outline:
(00:10) Executive Summary
(03:00) Introduction
(03:44) Motivating Example: Steering Against Evaluation Awareness
(06:21) Our Core Process
(08:20) Which Beliefs Are Load-Bearing?
(10:25) Is This Really Mech Interp?
(11:27) Our Comparative Advantage
(14:57) Why Pivot?
(15:20) Whats Changed In AI?
(16:08) Reflections On The Fields Progress
(18:18) Task Focused: The Importance Of Proxy Tasks
(18:52) Case Study: Sparse Autoencoders
(21:35) Ensure They Are Good Proxies
(23:11) Proxy Tasks Can Be About Understanding
(24:49) Types Of Projects: What Drives Research Decisions
(25:18) Focused Projects
(28:31) Exploratory Projects
(28:35) Curiosity Is A Double-Edged Sword
(30:56) Starting In A Robustly Useful Setting
(34:45) Time-Boxing
(36:27) Worked Examples
(39:15) Blending The Two: Tentative Proxy Tasks
(41:23) What's Your Contribution?
(43:08) Jack Lindsey's Approach
(45:44) Method Minimalism
(46:12) Case Study: Shutdown Resistance
(48:28) Try The Easy Methods First
(50:02) When Should We Develop New Methods?
(51:36) Call To Action
(53:04) Acknowledgments
(54:02) Appendix: Common Objections
(54:08) Aren't You Optimizing For Quick Wins Over Breakthroughs?
(56:34) What If AGI Is Fundamentally Different?
(57:30) I Care About Scientific Beauty and Making AGI Go Well
(58:09) Is This Just Applied Interpretability?
(58:44) Are You Saying This Because You Need To Prove Yourself Useful To Google?
(59:10) Does This Really Apply To People Outside AGI Companies?
(59:40) Aren't You Just Giving Up?
(01:00:04) Is Ambitious Reverse-engineering Actually Overcrowded?
(01:00:48) Appendix: Defining Mechanistic Interpretability
(01:01:44) Moving Toward Mechanistic OR Interpretability
The original text contained 47 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongExecutive Summary
---
Outline:
(00:10) Executive Summary
(03:00) Introduction
(03:44) Motivating Example: Steering Against Evaluation Awareness
(06:21) Our Core Process
(08:20) Which Beliefs Are Load-Bearing?
(10:25) Is This Really Mech Interp?
(11:27) Our Comparative Advantage
(14:57) Why Pivot?
(15:20) Whats Changed In AI?
(16:08) Reflections On The Fields Progress
(18:18) Task Focused: The Importance Of Proxy Tasks
(18:52) Case Study: Sparse Autoencoders
(21:35) Ensure They Are Good Proxies
(23:11) Proxy Tasks Can Be About Understanding
(24:49) Types Of Projects: What Drives Research Decisions
(25:18) Focused Projects
(28:31) Exploratory Projects
(28:35) Curiosity Is A Double-Edged Sword
(30:56) Starting In A Robustly Useful Setting
(34:45) Time-Boxing
(36:27) Worked Examples
(39:15) Blending The Two: Tentative Proxy Tasks
(41:23) What's Your Contribution?
(43:08) Jack Lindsey's Approach
(45:44) Method Minimalism
(46:12) Case Study: Shutdown Resistance
(48:28) Try The Easy Methods First
(50:02) When Should We Develop New Methods?
(51:36) Call To Action
(53:04) Acknowledgments
(54:02) Appendix: Common Objections
(54:08) Aren't You Optimizing For Quick Wins Over Breakthroughs?
(56:34) What If AGI Is Fundamentally Different?
(57:30) I Care About Scientific Beauty and Making AGI Go Well
(58:09) Is This Just Applied Interpretability?
(58:44) Are You Saying This Because You Need To Prove Yourself Useful To Google?
(59:10) Does This Really Apply To People Outside AGI Companies?
(59:40) Aren't You Just Giving Up?
(01:00:04) Is Ambitious Reverse-engineering Actually Overcrowded?
(01:00:48) Appendix: Defining Mechanistic Interpretability
(01:01:44) Moving Toward Mechanistic OR Interpretability
The original text contained 47 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

26,357 Listeners

2,456 Listeners

8,665 Listeners

4,176 Listeners

93 Listeners

1,600 Listeners

9,909 Listeners

93 Listeners

506 Listeners

5,527 Listeners

15,987 Listeners

543 Listeners

136 Listeners

95 Listeners

474 Listeners