
Sign up to save your podcasts
Or


Dario Amodei posted a new essay titled "The Urgency of Interpretability" a couple days ago.
Some excerpts I think are worth highlighting:
The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments[1]. But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking[2] because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts.
One might be forgiven for forgetting about Bing Sydney as an obvious example of "power-seeking" AI behavior, given how long ago that was, but lying? Given the very recent releases of Sonnet 3.7 and OpenAI's o3, and their much-remarked-upon propensity for reward hacking and [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Linkpost URL:
https://www.darioamodei.com/post/the-urgency-of-interpretability
---
Narrated by TYPE III AUDIO.
By LessWrongDario Amodei posted a new essay titled "The Urgency of Interpretability" a couple days ago.
Some excerpts I think are worth highlighting:
The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments[1]. But by the same token, we’ve never seen any solid evidence in truly real-world scenarios of deception and power-seeking[2] because we can’t “catch the models red-handed” thinking power-hungry, deceitful thoughts.
One might be forgiven for forgetting about Bing Sydney as an obvious example of "power-seeking" AI behavior, given how long ago that was, but lying? Given the very recent releases of Sonnet 3.7 and OpenAI's o3, and their much-remarked-upon propensity for reward hacking and [...]
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
Linkpost URL:
https://www.darioamodei.com/post/the-urgency-of-interpretability
---
Narrated by TYPE III AUDIO.

26,365 Listeners

2,437 Listeners

9,046 Listeners

4,153 Listeners

92 Listeners

1,595 Listeners

9,911 Listeners

90 Listeners

70 Listeners

5,470 Listeners

16,097 Listeners

536 Listeners

131 Listeners

95 Listeners

520 Listeners