Best AI papers explained

Asymptotic Safety Guarantees Based On Scalable Oversight


Listen Later

This details a presentation by Geoffrey Irving, Chief Scientist at the UK AI Safety Institute, discussing approaches to achieving asymptotic safety guarantees for AI. Irving critiques existing methods like scalable oversight (including techniques like debate), arguing that current theories and experiments suggest they will likely fail due to issues such as obfuscated arguments and exploration hacking. He proposes that while a full formal verification of neural networks is likely too difficult, an intermediate goal involving theoretical frameworks combined with empirical testing offers a more promising path forward. The discussion highlights the need for novel complexity theory to address problems like obfuscated arguments and suggests that the field needs significantly more researchers to tackle these fundamental challenges in AI safety.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang