TYPE III AUDIO (All episodes)

[Week 3] "The alignment problem from a deep learning perspective" (Sections 2, 3 and 4) by Richard Ngo, Lawrence Chan & Sören Mindermann


Listen Later

---
client: agi_sf
project_id: core_readings
feed_id: agi_sf__alignment
narrator: pw
qa: mds
qa_time: 1h00m
---

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are undesirable (i.e. misaligned) from a human perspective. We argue that if AGIs are trained in ways similar to today's most capable models, they could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their training distributions, and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing this outcome.

Original article:
https://arxiv.org/abs/2209.00626

Authors:
Richard Ngo, Lawrence Chan, Sören Mindermann

---
This article is featured on the AGI Safety Fundamentals: Alignment course curriculum.

Narrated by TYPE III AUDIO on behalf of BlueDot Impact.

Share feedback on this narration.

...more
View all episodesView all episodes
Download on the App Store

TYPE III AUDIO (All episodes)By TYPE III AUDIO