July 21, 2024

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

1 minute

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: aimless ace analyzes active amateur: a micro-aaaaalignment proposal, published by Luke H Miles on July 21, 2024 on The AI Alignment Forum.

This idea is so simple that I'm sure it's been had by someone somewhere.

Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace.

But we want to accomplish stuff! AIcorp wants the

printmoneynow.py. I'm worried that if I train the ace to fold fitted sheets and unkink the garden hose then it will go overboard and fold my femur bone and unkink the Yellowstone super volcano.

So let's use a bit-less-smart model to do tasks. The active amateur.

We'll ask the ace if the amateur is plotting something. If yes, we shut down the amateur. Every day we also ask "is today the last day I can shut down the amateur?"

(We do not generate code with the ace!)

So we get super-smart supervision of a decent task AI. The temptation (ie competitive pressure) to agentify the oracle is there still, but no longer overwhelming.

Maybe we can even have the amateur write ace_v2, have ace_v1 check ace_v2, switch to ace_v2, have amateur_v1 write amateur_v2, check it with ace_v2, then switch to amateur_v2.

If this looks like it's avoiding the hard parts of the alignment problem, that's because it is! But does something actually go wrong? Was it too big an assumption that we could build an aimless honest ace? (Or if you think the idea is good, then a bit of positive encouragement would be wonderful.)

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

View all episodes

By The Nonlinear Fund

July 21, 2024

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

1 minute

This idea is so simple that I'm sure it's been had by someone somewhere.

Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace.

But we want to accomplish stuff! AIcorp wants the

printmoneynow.py. I'm worried that if I train the ace to fold fitted sheets and unkink the garden hose then it will go overboard and fold my femur bone and unkink the Yellowstone super volcano.

So let's use a bit-less-smart model to do tasks. The active amateur.

We'll ask the ace if the amateur is plotting something. If yes, we shut down the amateur. Every day we also ask "is today the last day I can shut down the amateur?"

(We do not generate code with the ace!)

So we get super-smart supervision of a decent task AI. The temptation (ie competitive pressure) to agentify the oracle is there still, but no longer overwhelming.

Maybe we can even have the amateur write ace_v2, have ace_v1 check ace_v2, switch to ace_v2, have amateur_v1 write amateur_v2, check it with ace_v2, then switch to amateur_v2.

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

8 Listeners

Share AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

Sign up to save your podcasts

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast