
Sign up to save your podcasts
Or
Dario Amodei posted a new essay titled "The Urgency of Interpretability" a couple days ago.
Some excerpts I think are worth highlighting:
The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments[1]. But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking[2] because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts.
One might be forgiven for forgetting about Bing Sydney as an obvious example of "power-seeking" AI behavior, given how long ago that was, but lying? Given the very recent releases of Sonnet 3.7 and OpenAI's o3, and their much-remarked-upon propensity for reward hacking and [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Linkpost URL:
https://www.darioamodei.com/post/the-urgency-of-interpretability
Narrated by TYPE III AUDIO.
Dario Amodei posted a new essay titled "The Urgency of Interpretability" a couple days ago.
Some excerpts I think are worth highlighting:
The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments[1]. But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking[2] because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts.
One might be forgiven for forgetting about Bing Sydney as an obvious example of "power-seeking" AI behavior, given how long ago that was, but lying? Given the very recent releases of Sonnet 3.7 and OpenAI's o3, and their much-remarked-upon propensity for reward hacking and [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Linkpost URL:
https://www.darioamodei.com/post/the-urgency-of-interpretability
Narrated by TYPE III AUDIO.
26,367 Listeners
2,397 Listeners
7,779 Listeners
4,103 Listeners
87 Listeners
1,442 Listeners
8,778 Listeners
89 Listeners
355 Listeners
5,370 Listeners
15,053 Listeners
460 Listeners
126 Listeners
64 Listeners
432 Listeners