Share The Alignment Problem From a Deep Learning Perspective

Copy link

January 04, 2025

The Alignment Problem From a Deep Learning Perspective

33 minutes

Audio versions of blogs and papers from BlueDot courses.

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are undesirable (i.e. misaligned) from a human perspective. We argue that if AGIs are trained in ways similar to today's most capable models, they could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their training distributions, and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing this outcome.

Original article:
https://arxiv.org/abs/2209.00626

Authors:
Richard Ngo, Lawrence Chan, Sören Mindermann

A podcast by BlueDot Impact.

...more

View all episodes

By BlueDot Impact

January 04, 2025

The Alignment Problem From a Deep Learning Perspective

33 minutes

A podcast by BlueDot Impact.

...more

Sign up to save your podcasts