
Sign up to save your podcasts
Or
This is a linkpost for: www.apolloresearch.ai/blog/the-first-year-of-apollo-research
About Apollo Research
Apollo Research is an evaluation organization focusing on risks from deceptively aligned AI systems. We conduct technical research on AI model evaluations and interpretability and have a small AI governance team. As of 29 May 2024, we are one year old.
Executive Summary
For the UK AI Safety Summit, we developed a demonstration that Large Language Models (LLMs) can strategically deceive their primary users when put under pressure. The accompanying paper was referenced by experts and the press (e.g. AI Insight forum, BBC, Bloomberg) and accepted for oral presentation at the ICLR LLM agents workshop.
The evaluations team is currently working on capability evaluations for precursors of deceptive alignment, scheming model organisms, and a responsible scaling policy (RSP) on deceptive alignment. Our goal is to help governments and [...]
---
Outline:
(00:22) About Apollo Research
(00:44) Executive Summary
(03:01) Completed work
(03:04) Evaluations
(05:17) Interpretability
(07:21) Governance
(09:09) Current and Future work
(09:13) Evaluations
(10:36) Interpretability
(11:27) Governance
(12:35) Operational Highlights
(13:23) Challenges
(15:55) Forward Look
---
First published:
Source:
Narrated by TYPE III AUDIO.
This is a linkpost for: www.apolloresearch.ai/blog/the-first-year-of-apollo-research
About Apollo Research
Apollo Research is an evaluation organization focusing on risks from deceptively aligned AI systems. We conduct technical research on AI model evaluations and interpretability and have a small AI governance team. As of 29 May 2024, we are one year old.
Executive Summary
For the UK AI Safety Summit, we developed a demonstration that Large Language Models (LLMs) can strategically deceive their primary users when put under pressure. The accompanying paper was referenced by experts and the press (e.g. AI Insight forum, BBC, Bloomberg) and accepted for oral presentation at the ICLR LLM agents workshop.
The evaluations team is currently working on capability evaluations for precursors of deceptive alignment, scheming model organisms, and a responsible scaling policy (RSP) on deceptive alignment. Our goal is to help governments and [...]
---
Outline:
(00:22) About Apollo Research
(00:44) Executive Summary
(03:01) Completed work
(03:04) Evaluations
(05:17) Interpretability
(07:21) Governance
(09:09) Current and Future work
(09:13) Evaluations
(10:36) Interpretability
(11:27) Governance
(12:35) Operational Highlights
(13:23) Challenges
(15:55) Forward Look
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,388 Listeners
7,910 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners