September 27, 2023

AF - Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors

6 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Projects I would like to see (possibly at AI Safety Camp), published by Linda Linsefors on September 27, 2023 on The AI Alignment Forum.

I recently discussed with my AISC co-organiser Remmelt, some possible project ideas I would be excited about seeing at the upcoming AISC, and I thought these would be valuable to share more widely.

Thanks to Remmelt for helfull suggestions and comments.

What is AI Safety Camp?

AISC in its current form is primarily a structure to help people find collaborators. As a research lead we give your project visibility, and help you recruit a team. As a regular participant, we match you up with a project you can help with.

I want to see more good projects happening. I know there is a lot of unused talent wanting to help with AI safety. If you want to run one of these projects, it doesn't matter to me if you do it as part of AISC or independently, or as part of some other program. The purpose of this post is to highlight these projects as valuable things to do, and to let you know AISC can support you, if you think what we offer is helpful.

Project ideas

These are not my after-long-consideration top picks of most important things to do, just some things I think would be net positive if someone would do. I typically don't spend much cognitive effort on absolute rankings anyway, since I think personal fit is more important for ranking your personal options. I don't claim originality for anything here. It's possible there is work on one or several of these topics, which I'm not aware of. Please share links in comments, if you know of such work.

Is substrate-needs convergence inevitable for any autonomous system, or is it preventable with sufficient error correction techniques?

This can be done as an adversarial collaboration (see below) but doesn't have to be.

The risk from substrate-needs convergence can be summarised as such:

If AI is complex enough to self-sufficiently maintain its components, natural selection will sneak in.

This would select for components that cause environmental conditions needed for artificial self-replication.

An AGI will necessarily be complex enough.

Therefore natural selection will push the system towards self replication. Therefore it is not possible for an AGI to be stably aligned with any other goal. Note that this line of reasoning does not necessitate that the AI will come to represent self replication as its goal (although that is a possible outcome), only that natural selection will push it towards this behaviour.

I'm simplifying and skipping over a lot of steps! I don't think there currently is a great writeup of the full argument, but if you're interested you can read more here or watch this talk by Remmelt or reach out to me or Remmelt. Remmelt has a deeper understanding of the arguments for substrate-needs convergence than me, but my communication style might be better suited for some people.

I think substrate-needs convergence is pointing at a real risk. I don't know yet if the argument (which I summarised above) proves that building an AGI that stays aligned is impossible, or if it points to one more challenge to be overcome. Figuring out which of these is the case seems very important. I've talked to a few people about this problem, and identified what I think is the main crux: How well you can execute error correction mechanisms? When Forest Laundry and Anders Sandberg discussed substrate-needs convergence, they ended up with a similar crux, but unfortunately did not have time to address it. Here's a recording of their discussion, however Landry's mic breaks about 20 minutes in, which makes it hard to hear him from that point onward.

Any alignment-relevant adversarial collaboration

What is adversarial collaborations: [link to some Scott Post]

Possible topic:

For and against some alignment plan. Maybe y...

...more

View all episodes

By The Nonlinear Fund

September 27, 2023

AF - Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors

6 minutes

I recently discussed with my AISC co-organiser Remmelt, some possible project ideas I would be excited about seeing at the upcoming AISC, and I thought these would be valuable to share more widely.

Thanks to Remmelt for helfull suggestions and comments.

What is AI Safety Camp?

Project ideas

Is substrate-needs convergence inevitable for any autonomous system, or is it preventable with sufficient error correction techniques?

This can be done as an adversarial collaboration (see below) but doesn't have to be.

The risk from substrate-needs convergence can be summarised as such:

If AI is complex enough to self-sufficiently maintain its components, natural selection will sneak in.

This would select for components that cause environmental conditions needed for artificial self-replication.

An AGI will necessarily be complex enough.

Any alignment-relevant adversarial collaboration

What is adversarial collaborations: [link to some Scott Post]

Possible topic:

For and against some alignment plan. Maybe y...

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors

Sign up to save your podcasts

AF - Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors

AF - Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast