
Sign up to save your podcasts
Or


Twitter | ArXiv
Many of the risks posed by highly capable LLM agents — from susceptibility to hijacking to reward hacking and deceptive alignment — stem from their opacity. If we could reliably monitor the reasoning processes underlying AI decisions, many of those risks would become far more tractable. Compared to other approaches in AI, LLMs offer a unique advantage: they can ``think out loud'' using chain-of-thought (CoT) enabling oversight of their decision-making processes. Yet the reliability of such monitoring hinges on an empirical question: do models need to externalize their reasoning in human language, or can they achieve the same performance through opaque internal computation?
In our new paper, we investigate LLM latent reasoning capabilities using two-hop question answering as a case study. We fine-tune LLMs (including Llama 3 8B and GPT-4o) on synthetic facts and test two-hop reasoning over these facts. By using [...]
---
First published:
Source:
Linkpost URL:
https://arxiv.org/abs/2411.16353
---
Narrated by TYPE III AUDIO.
By LessWrongTwitter | ArXiv
Many of the risks posed by highly capable LLM agents — from susceptibility to hijacking to reward hacking and deceptive alignment — stem from their opacity. If we could reliably monitor the reasoning processes underlying AI decisions, many of those risks would become far more tractable. Compared to other approaches in AI, LLMs offer a unique advantage: they can ``think out loud'' using chain-of-thought (CoT) enabling oversight of their decision-making processes. Yet the reliability of such monitoring hinges on an empirical question: do models need to externalize their reasoning in human language, or can they achieve the same performance through opaque internal computation?
In our new paper, we investigate LLM latent reasoning capabilities using two-hop question answering as a case study. We fine-tune LLMs (including Llama 3 8B and GPT-4o) on synthetic facts and test two-hop reasoning over these facts. By using [...]
---
First published:
Source:
Linkpost URL:
https://arxiv.org/abs/2411.16353
---
Narrated by TYPE III AUDIO.

26,319 Listeners

2,452 Listeners

8,529 Listeners

4,176 Listeners

93 Listeners

1,601 Listeners

9,936 Listeners

95 Listeners

517 Listeners

5,509 Listeners

15,918 Listeners

552 Listeners

131 Listeners

93 Listeners

466 Listeners