May 06, 2025

Asymptotic Safety Guarantees Based On Scalable Oversight

19 minutes

This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.

...more

View all episodes

By Enoch H. Kang

May 06, 2025

Asymptotic Safety Guarantees Based On Scalable Oversight

19 minutes

...more

Share Asymptotic Safety Guarantees Based On Scalable Oversight

Sign up to save your podcasts

Asymptotic Safety Guarantees Based On Scalable Oversight

Asymptotic Safety Guarantees Based On Scalable Oversight