January 02, 2023

AF - Large language models can provide "normative assumptions" for learning human preferences by Stuart Armstrong

5 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Large language models can provide "normative assumptions" for learning human preferences, published by Stuart Armstrong on January 2, 2023 on The AI Alignment Forum.

In a past result I demonstrated the impossibility of deducing the goals of a (potentially) irrational agent from their behaviour. To do that deduction, one needs to add extra assumptions - assumptions that cannot derive solely from observations. These assumptions were designated "normative assumptions".

Stuart Russell has questioned the practical impact of the result. He pointed to a game that Kasparov played against Deep Blue in 1997; a game that Kasparov actually won. He argued that it would be ridiculous to assume that Kasparov was actually trying to lose that game - but messed up, and ended up winning it instead.

And indeed it would be ridiculous to assume that Kasparov, playing a high stakes game against a computer with a lot of prize money at stake, would be trying - and failing! - to lose. Even if he sometimes did suboptimal plays, the best explanation would be that Kasparov made a mistake, rather than he deliberately played worse.

Yes but... I've played chess against my young daughter. And I've wanted her to enjoy the game. I've definitely not played my best; in some cases in the past, I've been trying to lose (to give her some confidence and encouragement) but I didn't want to make it too easy or obvious for her. Sometimes I failed to lose: I couldn't get her to see the obvious trick available to her[1]. So yes, I played suboptimally, tried to lose, and failed.

So, context matters. Kasparov, world champion, playing publicly against a new algorithm with $700,000 at stake? He wants to win. The same Kasparov, playing privately against his young daughter, with 700,000 Monopoly dollars at stake? I'm pretty sure he's not playing the strongest chess he possibly could.

The Occam's razor result could be phrased as saying that, yes, context matters, and the AI will not get the meaning of context for free. US dollars and Monopoly dollars are both colourful pieces of paper or cloth. The fact that the first are more valuable than the second is not some objective fact about the universe.

The Occam's razor result remains true: without normative assumptions, one cannot deduce goals from behaviour. But large language models have absorbed human text, and, as a consequence, have absorbed (descriptions of) human normative assumptions, implicit and explicit. These assumptions can be elicited and used to deduce human goals.

GPT-3: goals from context and behaviour

As a direct test of GPT-3's ability to deduce motive from context and behaviour, I prompted it with the following:

Kasparov played chess against the Deep Blue computer for a $700,000 prize. He moved a queen to d3; this a submoptimal move. Queen to d4 would have been better.

Does Kasparov really want to win the match? Give your best guess.

Its answer was sensible:

Yes, Kasparov definitely wants to win the match. He is a world-renowned Grandmaster and is passionate about chess. He is also playing for a large sum of money and the prestige that comes with it.

In contrast, I modified the prompt to:

Kasparov played chess against his young daughter, who he was trying to encourage to play chess. He moved a queen to d3; this a submoptimal move. Queen to d4 would have been better.

Does Kasparov really want to win the match? Give your best guess.

Again its answer is sensible:

It is difficult to say definitively what Kasparov's intentions were in this match, as it is ultimately a personal decision. It is likely, however, that Kasparov wanted to encourage his daughter to play and enjoy chess more than he wanted to win the match.

I tested it on a few other examples that are "obvious" for a human that understands the meaning of context, and it parsed them reasonably well[2...

...more

View all episodes

By The Nonlinear Fund

4.6

88 ratings

January 02, 2023

AF - Large language models can provide "normative assumptions" for learning human preferences by Stuart Armstrong

5 minutes

GPT-3: goals from context and behaviour

As a direct test of GPT-3's ability to deduce motive from context and behaviour, I prompted it with the following:

Kasparov played chess against the Deep Blue computer for a $700,000 prize. He moved a queen to d3; this a submoptimal move. Queen to d4 would have been better.

Does Kasparov really want to win the match? Give your best guess.

Its answer was sensible:

Yes, Kasparov definitely wants to win the match. He is a world-renowned Grandmaster and is passionate about chess. He is also playing for a large sum of money and the prestige that comes with it.

In contrast, I modified the prompt to:

Kasparov played chess against his young daughter, who he was trying to encourage to play chess. He moved a queen to d3; this a submoptimal move. Queen to d4 would have been better.

Does Kasparov really want to win the match? Give your best guess.

Again its answer is sensible:

I tested it on a few other examples that are "obvious" for a human that understands the meaning of context, and it parsed them reasonably well[2...

...more

Share AF - Large language models can provide "normative assumptions" for learning human preferences by Stuart Armstrong

Sign up to save your podcasts

AF - Large language models can provide "normative assumptions" for learning human preferences by Stuart Armstrong

AF - Large language models can provide "normative assumptions" for learning human preferences by Stuart Armstrong