LessWrong (30+ Karma)

“Recent Redwood Research project proposals” by ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson


Listen Later

Previously, we've shared a few higher-effort project proposals relating to AI control in particular. In this post, we'll share a whole host of less polished project proposals. All of these projects excite at least one Redwood researcher, and high-quality research on any of these problems seems pretty valuable. They differ widely in scope, area, and difficulty.

Control

These projects are all related to the field of AI Control. Many of them are extensions of Redwood's previous work in this area.

Basic open questions in control

  • Control Protocol Transfer Across Setting [PUBLIC]
    • So far, all the work in comparing different control protocols measures effectiveness only on one setting. Do the results we get transfer between different settings?
    • Backdoor Auditing with Humans [PUBLIC]
      • Most control schemes bottom out at some point in auditing—having actual humans review suspicious actions. However [...]
      • ---

        Outline:

        (00:36) Control

        (00:47) Basic open questions in control

        (01:46) Monitoring protocols

        (02:49) Untrusted monitoring and collusion

        (03:31) Elicitation, sandbagging, and diffuse threats (e.g. research sabotage)

        (04:05) Synthetic information and inputs

        (04:38) Training-time alignment methods

        (04:42) Science of (mis-)alignment

        (05:23) Alignment / training schemes

        (05:53) RL and Reward Hacking

        (06:20) Better understanding and interpretability

        (07:43) Other

        ---

        First published:

        July 14th, 2025

        Source:

        https://www.lesswrong.com/posts/RRxhzshdpneyTzKfq/recent-redwood-research-project-proposals

        ---

        Narrated by TYPE III AUDIO.

        ...more
        View all episodesView all episodes
        Download on the App Store

        LessWrong (30+ Karma)By LessWrong


        More shows like LessWrong (30+ Karma)

        View all
        Making Sense with Sam Harris by Sam Harris

        Making Sense with Sam Harris

        26,469 Listeners

        Conversations with Tyler by Mercatus Center at George Mason University

        Conversations with Tyler

        2,395 Listeners

        The Peter Attia Drive by Peter Attia, MD

        The Peter Attia Drive

        7,928 Listeners

        Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

        Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

        4,142 Listeners

        ManifoldOne by Steve Hsu

        ManifoldOne

        89 Listeners

        Your Undivided Attention by Tristan Harris and Aza Raskin, The Center for Humane Technology

        Your Undivided Attention

        1,472 Listeners

        All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

        All-In with Chamath, Jason, Sacks & Friedberg

        9,207 Listeners

        Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

        Machine Learning Street Talk (MLST)

        88 Listeners

        Dwarkesh Podcast by Dwarkesh Patel

        Dwarkesh Podcast

        417 Listeners

        Hard Fork by The New York Times

        Hard Fork

        5,448 Listeners

        The Ezra Klein Show by New York Times Opinion

        The Ezra Klein Show

        15,237 Listeners

        Moonshots with Peter Diamandis by PHD Ventures

        Moonshots with Peter Diamandis

        481 Listeners

        No Priors: Artificial Intelligence | Technology | Startups by Conviction

        No Priors: Artificial Intelligence | Technology | Startups

        121 Listeners

        Latent Space: The AI Engineer Podcast by swyx + Alessio

        Latent Space: The AI Engineer Podcast

        75 Listeners

        BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

        BG2Pod with Brad Gerstner and Bill Gurley

        461 Listeners