--- client: t3a feed_id: ai_safety_abstracts narrator: ai ---This episode covers 3 abstracts:<ol><li><a href='https://arxiv.org/abs/2303.00894'>Active reward learning from multiple teachers</a> - Peter Barnett et al. </li><li><a href='https://arxiv.org/abs/2302.00805'>Conditioning Predictive Models: Risks and Strategies</a> - Hubinger et al.</li><li><a href='https://arxiv.org/abs/2211.00593'>Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT2 small </a>- Kevin Wang et al.</li></ol><a href='https://docs.google.com/forms/d/e/1FAIpQLScC_z9tBDWzxoS3D0EfNT0Kcjs_T4rNN3FyVwElSdyanel0rA/viewform?usp=pp_url&entry.584848066=https://forum.effectivealtruism.org/posts/jpsugrAbjsgfm9gZM/eag-talks-are-underrated-imo'>Share feedback on this narration</a>.

--- client: t3a feed_id: ai_safety_abstracts narrator: ai --- This episode covers 3 abstracts:Active reward learning from multiple teachers - Peter Barnett et al. Conditioning Predictive Models: Risks and Strategies - Hubinger et al.Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT2 small - Kevin Wang et al.Share feedback on this narration.

--- client: t3a feed_id: ai_safety_abstracts narrator: ai ---This episode covers 3 abstracts:<ol><li><a href="https://arxiv.org/abs/2303.00894" rel="noopener noreferrer">Active reward learning from multiple teachers</a> - Peter Barnett et al.&nbsp;</li><li><a href="https://arxiv.org/abs/2302.00805" rel="noopener noreferrer">Conditioning Predictive Models: Risks and Strategies</a> - Hubinger et al.</li><li><a href="https://arxiv.org/abs/2211.00593" rel="noopener noreferrer">Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT2 small </a>- Kevin Wang et al.</li></ol><a href="https://docs.google.com/forms/d/e/1FAIpQLScC_z9tBDWzxoS3D0EfNT0Kcjs_T4rNN3FyVwElSdyanel0rA/viewform?usp=pp_url&amp;entry.584848066=https://forum.effectivealtruism.org/posts/jpsugrAbjsgfm9gZM/eag-talks-are-underrated-imo" rel="noopener noreferrer">Share feedback on this narration</a>.

Share Interpretability in the wild and other papers

Sign up to save your podcasts

Interpretability in the wild and other papers

Interpretability in the wild and other papers