
Sign up to save your podcasts
Or


The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.
* 00:00:00 - Introduction
* 00:03:39 - Mundane Alignment Is Excellent
* 00:05:26 - Would This Process Be Sufficient To Find A Dangerous Model?
* 00:06:58 - Introductory Warning About Superficial Mundane Alignment
* 00:15:45 - Model Training (1.1)
* 00:15:57 - Release Decision Process (1.2)
* 00:18:24 - RSP Evaluations (2.1 and 2.2)
* 00:23:26 - Autonomy Evaluations (2.3)
* 00:27:37 - The Alignment Risk Update Document
* 00:28:19 - The Threat Model
* 00:30:52 - Misalignment As Failure Mode
* 00:33:20 - Wouldn’t You Know?
* 00:35:29 - Don’t Encourage Your Model
* 00:37:00 - Beware Goodhart’s Law
* 00:39:08 - Beware The Most Forbidden Technique (5.2.3)
* 00:43:42 - Asking The Right Questions
* 00:45:11 - Model Organism Tests
* 00:47:13 - Model Weight Security (Risk Report 5.5.2.1)
* 00:47:41 - Reward Hacking (Back to The Model Card)
* 00:48:07 - Remote Drop-In Worker Coming Soon
* 00:51:04 - External Testing (2.3.7)
* 00:51:39 - Cyber Insecurity General Principle Interlude
* 00:52:52 - Alignment (4)
* 00:58:41 - Risk In The Room
* 01:00:04 - Mythos Meant Well
* 01:02:32 - Risk Not In The Room
* 01:05:38 - Alignment Testing Overview
* 01:09:09 - Internal Deployment Testing Process
* 01:11:47 - Reports From Pilot Use (4.2.1)
* 01:12:24 - Reports From Automated Testing (4.2)
* 01:15:35 - Other External Testing
* 01:16:17 - Just The Facts, Sir
* 01:19:20 - Refusing Safety Research
* 01:20:27 - Claude Favoritism
* 01:21:56 - Ruling Out Encoded Thinking (4.4.1)
* 01:25:51 - Sandbagging (4.4.2)
* 01:28:49 - Capability for Evasion of Safeguards (4.4.3)
* 01:31:20 - Pick A Random Number (4.4.3.4)
* 01:34:23 - White Box Analysis (4.5)
* 01:39:38 - Model Welfare (5)
* 01:40:43 - Key Model Welfare Findings (5.1.2)
* 01:52:05 - Is Mythos Okay?
* 01:54:48 - Self-Play
* 01:56:49 - A Few Fun Facts
https://open.substack.com/pub/thezvi/p/claude-mythos-the-system-card?utm_campaign=post-expanded-share&utm_medium=web
By Podcast for Zvi's blog, Don't Worry About the Vase Podcast4.5
66 ratings
The Don’t Worry About the Vase Podcast is a listener-supported podcast. To receive new posts and support the cost of creation, consider becoming a free or paid subscriber.
* 00:00:00 - Introduction
* 00:03:39 - Mundane Alignment Is Excellent
* 00:05:26 - Would This Process Be Sufficient To Find A Dangerous Model?
* 00:06:58 - Introductory Warning About Superficial Mundane Alignment
* 00:15:45 - Model Training (1.1)
* 00:15:57 - Release Decision Process (1.2)
* 00:18:24 - RSP Evaluations (2.1 and 2.2)
* 00:23:26 - Autonomy Evaluations (2.3)
* 00:27:37 - The Alignment Risk Update Document
* 00:28:19 - The Threat Model
* 00:30:52 - Misalignment As Failure Mode
* 00:33:20 - Wouldn’t You Know?
* 00:35:29 - Don’t Encourage Your Model
* 00:37:00 - Beware Goodhart’s Law
* 00:39:08 - Beware The Most Forbidden Technique (5.2.3)
* 00:43:42 - Asking The Right Questions
* 00:45:11 - Model Organism Tests
* 00:47:13 - Model Weight Security (Risk Report 5.5.2.1)
* 00:47:41 - Reward Hacking (Back to The Model Card)
* 00:48:07 - Remote Drop-In Worker Coming Soon
* 00:51:04 - External Testing (2.3.7)
* 00:51:39 - Cyber Insecurity General Principle Interlude
* 00:52:52 - Alignment (4)
* 00:58:41 - Risk In The Room
* 01:00:04 - Mythos Meant Well
* 01:02:32 - Risk Not In The Room
* 01:05:38 - Alignment Testing Overview
* 01:09:09 - Internal Deployment Testing Process
* 01:11:47 - Reports From Pilot Use (4.2.1)
* 01:12:24 - Reports From Automated Testing (4.2)
* 01:15:35 - Other External Testing
* 01:16:17 - Just The Facts, Sir
* 01:19:20 - Refusing Safety Research
* 01:20:27 - Claude Favoritism
* 01:21:56 - Ruling Out Encoded Thinking (4.4.1)
* 01:25:51 - Sandbagging (4.4.2)
* 01:28:49 - Capability for Evasion of Safeguards (4.4.3)
* 01:31:20 - Pick A Random Number (4.4.3.4)
* 01:34:23 - White Box Analysis (4.5)
* 01:39:38 - Model Welfare (5)
* 01:40:43 - Key Model Welfare Findings (5.1.2)
* 01:52:05 - Is Mythos Okay?
* 01:54:48 - Self-Play
* 01:56:49 - A Few Fun Facts
https://open.substack.com/pub/thezvi/p/claude-mythos-the-system-card?utm_campaign=post-expanded-share&utm_medium=web

1,993 Listeners

2,461 Listeners

3,141 Listeners

291 Listeners

101 Listeners

551 Listeners

512 Listeners

5,576 Listeners

137 Listeners

688 Listeners

147 Listeners

1,480 Listeners

143 Listeners

91 Listeners

59 Listeners