
Sign up to save your podcasts
Or
In this episode, Ryan Greenblatt, Chief Scientist at Redwood Research, discusses various facets of AI safety and alignment. He delves into recent research on alignment faking, covering experiments involving different setups such as system prompts, continued pre-training, and reinforcement learning. Ryan offers insights on methods to ensure AI compliance, including giving AIs the ability to voice objections and negotiate deals. The conversation also touches on the future of AI governance, the risks associated with AI development, and the necessity of international cooperation. Ryan shares his perspective on balancing AI progress with safety, emphasizing the need for transparency and cautious advancement.
Ryan's work (with co-authors at Anthropic) on Alignment Faking: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models
Ryan's work on striking deals with AIs: https://www.lesswrong.com/posts/7C4KJot4aN8ieEDoz/will-alignment-faking-claude-accept-a-deal-to-reveal-its
Ryan's critique of Anthropic's RSP work: https://www.lesswrong.com/posts/6tjHf5ykvFqaNCErH/anthropic-s-responsible-scaling-policy-and-long-term-benefit?commentId=NyqcvZifqznNGKxdT
SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive
NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive
Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive
RECOMMENDED PODCAST:
🎙️Check out Modern Relationships, where Erik Torenberg interviews tech power couples and leading thinkers to explore how ambitious people actually make partnerships work. This season's guests include: Delian Asparouhov & Nadia Asparouhova, Kristen Berman & Phil Levin, Rob Henderson, and Liv Boeree & Igor Kurganov.
Apple: https://podcasts.apple.com/us/podcast/id1786227593Â
Spotify: https://open.spotify.com/show/5hJzs0gDg6lRT6r10mdpVgÂ
YouTube: https://www.youtube.com/@ModernRelationshipsPodÂ
4.6
8181 ratings
In this episode, Ryan Greenblatt, Chief Scientist at Redwood Research, discusses various facets of AI safety and alignment. He delves into recent research on alignment faking, covering experiments involving different setups such as system prompts, continued pre-training, and reinforcement learning. Ryan offers insights on methods to ensure AI compliance, including giving AIs the ability to voice objections and negotiate deals. The conversation also touches on the future of AI governance, the risks associated with AI development, and the necessity of international cooperation. Ryan shares his perspective on balancing AI progress with safety, emphasizing the need for transparency and cautious advancement.
Ryan's work (with co-authors at Anthropic) on Alignment Faking: https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models
Ryan's work on striking deals with AIs: https://www.lesswrong.com/posts/7C4KJot4aN8ieEDoz/will-alignment-faking-claude-accept-a-deal-to-reveal-its
Ryan's critique of Anthropic's RSP work: https://www.lesswrong.com/posts/6tjHf5ykvFqaNCErH/anthropic-s-responsible-scaling-policy-and-long-term-benefit?commentId=NyqcvZifqznNGKxdT
SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive
NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive
Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive
RECOMMENDED PODCAST:
🎙️Check out Modern Relationships, where Erik Torenberg interviews tech power couples and leading thinkers to explore how ambitious people actually make partnerships work. This season's guests include: Delian Asparouhov & Nadia Asparouhova, Kristen Berman & Phil Levin, Rob Henderson, and Liv Boeree & Igor Kurganov.
Apple: https://podcasts.apple.com/us/podcast/id1786227593Â
Spotify: https://open.spotify.com/show/5hJzs0gDg6lRT6r10mdpVgÂ
YouTube: https://www.youtube.com/@ModernRelationshipsPodÂ
441 Listeners
189 Listeners
290 Listeners
88 Listeners
355 Listeners
89 Listeners
123 Listeners
197 Listeners
76 Listeners
62 Listeners
443 Listeners
16 Listeners
11 Listeners
144 Listeners
20 Listeners
43 Listeners
2 Listeners
20 Listeners
41 Listeners
31 Listeners
148 Listeners
7 Listeners
74 Listeners
4 Listeners
4 Listeners
36 Listeners
116 Listeners
15 Listeners
0 Listeners
20 Listeners
7 Listeners