
Sign up to save your podcasts
Or
Work produced at Aether. Thanks to Benjamin Arnav for providing us experimentation data and for helpful discussions, and to Francis Rhys Ward and Matt MacDermott for useful feedback.
Executive Summary
---
Outline:
(00:27) Executive Summary
(01:55) Motivation
(03:24) Experiment Setting
(04:57) Extract-and-Evaluate Monitoring
(08:30) Results: GPT 4.1-mini as both the Quote Extractor and the Judge
(10:33) Results: GPT 4.1-mini as the Quote Extractor, GPT 4.1 as the Judge
(15:08) Future Work
(17:49) Author Contributions Statement
(18:16) Appendix A: Details about the Experiment Setting
(19:15) Appendix B: CoT+action Monitor and Quote Extractor Prompt
(19:25) Appendix C: Judge Prompt
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Work produced at Aether. Thanks to Benjamin Arnav for providing us experimentation data and for helpful discussions, and to Francis Rhys Ward and Matt MacDermott for useful feedback.
Executive Summary
---
Outline:
(00:27) Executive Summary
(01:55) Motivation
(03:24) Experiment Setting
(04:57) Extract-and-Evaluate Monitoring
(08:30) Results: GPT 4.1-mini as both the Quote Extractor and the Judge
(10:33) Results: GPT 4.1-mini as the Quote Extractor, GPT 4.1 as the Judge
(15:08) Future Work
(17:49) Author Contributions Statement
(18:16) Appendix A: Details about the Experiment Setting
(19:15) Appendix B: CoT+action Monitor and Quote Extractor Prompt
(19:25) Appendix C: Judge Prompt
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
26,328 Listeners
2,398 Listeners
7,983 Listeners
4,119 Listeners
90 Listeners
1,499 Listeners
9,269 Listeners
91 Listeners
425 Listeners
5,462 Listeners
15,410 Listeners
508 Listeners
124 Listeners
71 Listeners
468 Listeners