
Sign up to save your podcasts
Or
Many people helped us a great deal in developing the questions and ideas in this post, including people at CHAI, MATS, various other places in Berkeley, and Aether. To all of them: Thank you very much! Any mistakes are our own.
Foundation model agents - systems like AutoGPT and Devin that equip foundations models with planning, memory, tool use, and other affordances to perform autonomous tasks - seem to have immense implications for AI capabilities and safety. As such, I (Rohan) am planning to do foundation model agent safety research.
Following the spirit of an earlier post I wrote, I thought it would be fun and valuable write as many interesting questions as I could about foundation model agent safety. I shared these questions with my collaborators, and Govind wrote a bunch more questions that he is interested in. This post includes questions from both of us. [...]
---
Outline:
(01:14) Rohan
(01:28) Basics and Current Status
(03:16) Chain-of-Thought (CoT) Interpretability
(08:02) Goals
(10:18) Forecasting (Technical and Sociological)
(16:43) Broad Conceptual Safety Questions
(21:50) Miscellaneous
(25:21) Govind
(25:24) OpenAI o1 and other RL CoT Agents
(26:30) Linguistic Drift, Neuralese, and Steganography
(27:32) Agentic Performance
(28:57) Forecasting
---
First published:
Source:
Narrated by TYPE III AUDIO.
Many people helped us a great deal in developing the questions and ideas in this post, including people at CHAI, MATS, various other places in Berkeley, and Aether. To all of them: Thank you very much! Any mistakes are our own.
Foundation model agents - systems like AutoGPT and Devin that equip foundations models with planning, memory, tool use, and other affordances to perform autonomous tasks - seem to have immense implications for AI capabilities and safety. As such, I (Rohan) am planning to do foundation model agent safety research.
Following the spirit of an earlier post I wrote, I thought it would be fun and valuable write as many interesting questions as I could about foundation model agent safety. I shared these questions with my collaborators, and Govind wrote a bunch more questions that he is interested in. This post includes questions from both of us. [...]
---
Outline:
(01:14) Rohan
(01:28) Basics and Current Status
(03:16) Chain-of-Thought (CoT) Interpretability
(08:02) Goals
(10:18) Forecasting (Technical and Sociological)
(16:43) Broad Conceptual Safety Questions
(21:50) Miscellaneous
(25:21) Govind
(25:24) OpenAI o1 and other RL CoT Agents
(26:30) Linguistic Drift, Neuralese, and Steganography
(27:32) Agentic Performance
(28:57) Forecasting
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,366 Listeners
2,383 Listeners
7,944 Listeners
4,137 Listeners
87 Listeners
1,459 Listeners
9,050 Listeners
88 Listeners
386 Listeners
5,422 Listeners
15,220 Listeners
473 Listeners
120 Listeners
76 Listeners
456 Listeners