
Sign up to save your podcasts
Or
This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.
This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.