
Sign up to save your podcasts
Or
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new “dangerous capability” evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
At last, DeepMind talks about its dangerous capability evals. Yay!
(DeepMind hasn't yet made RSP-like commitments — that is, specific commitments about risk assessment (for extreme risks), safety and security practices as a function of risk assessment results, and training and deployment decisions as a function of risk assessment results. Indeed [...]
---
First published:
Source:
Linkpost URL:
https://arxiv.org/abs/2403.13793
Narrated by TYPE III AUDIO.
To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new “dangerous capability” evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.
At last, DeepMind talks about its dangerous capability evals. Yay!
(DeepMind hasn't yet made RSP-like commitments — that is, specific commitments about risk assessment (for extreme risks), safety and security practices as a function of risk assessment results, and training and deployment decisions as a function of risk assessment results. Indeed [...]
---
First published:
Source:
Linkpost URL:
https://arxiv.org/abs/2403.13793
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
461 Listeners