September 28, 2023

AF - Alignment Workshop talks by Richard Ngo

2 minutes

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Workshop talks, published by Richard Ngo on September 28, 2023 on The AI Alignment Forum.

In February 2023, researchers from a number of top industry AI labs (OpenAI, DeepMind, Anthropic) and universities (Cambridge, NYU) co-organized a two-day workshop on the problem of AI alignment, attended by 80 of the world's leading machine learning researchers. We're now making recordings and transcripts of the talks available online. The content ranged from very concrete to highly speculative, and the recordings include the many questions, interjections and debates which arose throughout.

If you're a machine learning researcher interested in attending follow-up workshops similar to the San Francisco alignment workshop, you can fill out this form.

Main Talks

Ilya Sutskever - Opening Remarks: Confronting the Possibility of AGIJacob Steinhardt - Aligning Massive Models: Current and Future ChallengesAjeya Cotra - "Situational Awareness" Makes Measuring Safety TrickyPaul Christiano - How Misalignment Could Lead to TakeoverJan Leike - Scaling Reinforcement Learning from Human FeedbackChris Olah - Looking Inside Neural Networks with Mechanistic InterpretabilityDan Hendrycks - Surveying Safety Research Directions

Lightning talks (Day 1)

Jason Wei - Emergent abilities of language modelsMartin Wattenberg - Emergent world models and instrumenting AI systemsBeen Kim - Alignment, setbacks and beyond alignmentJascha Sohl-Dickstein - More intelligent agents behave less coherentlyEthan Perez - Model-written evalsDaniel Brown - Challenges and progress towards efficient and causal preference-based reward wearningBoaz Barak - For both alignment and utility: focus on the medium termEllie Pavlick - Comparing neural networks' conceptual representations to humans'Percy Liang - Transparency and standards for language model evaluation

Lightning talks (Day 2)

Sam Bowman - Measuring progress on scalable oversight for large language modelsZico Kolter - "Safe Mode": the case for (manually) verifying the output of LLMsRoger Grosse - Understanding LLM generalization using influence functionsScott Niekum - Models of human preferences for learning reward functionsAleksander Madry - Faster datamodels as a new approach to alignmentAndreas Stuhlmuller - Iterated decomposition: improving science Q&A by supervising reasoning processesPaul Christiano - Mechanistic anomaly detectionLionel Levine - Social dynamics of reinforcement learnersVincent Conitzer - Foundations of cooperative AI labScott Aaronson - Cryptographic backdoors in large language models

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

View all episodes

By The Nonlinear Fund

September 28, 2023

AF - Alignment Workshop talks by Richard Ngo

2 minutes

If you're a machine learning researcher interested in attending follow-up workshops similar to the San Francisco alignment workshop, you can fill out this form.

Main Talks

Lightning talks (Day 1)

Lightning talks (Day 2)

Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

...more

More shows like The Nonlinear Library: Alignment Forum

View all

AXRP - the AI X-risk Research Podcast

9 Listeners

Share AF - Alignment Workshop talks by Richard Ngo

Sign up to save your podcasts

AF - Alignment Workshop talks by Richard Ngo

AF - Alignment Workshop talks by Richard Ngo

More shows like The Nonlinear Library: Alignment Forum

AXRP - the AI X-risk Research Podcast