"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

Autonomous Organizations: Vending Bench & Beyond, w/ Lukas Petersson & Axel Backlund of Andon Labs


Listen Later

Today Lukas Petersson and Axel Backlund of Andon Labs join The Cognitive Revolution to discuss their experiments deploying autonomous AI agents to run real-world vending machines, exploring the safety challenges and unexpected behaviors that emerge when frontier models like Claude and Grok operate without human oversight.

Read transcript of the episode here.

Check out our sponsors: Oracle Cloud Infrastructure, Shopify.

Shownotes below brought to you by Notion AI Meeting Notes - try one month for free at https://⁠⁠notion.com/lp/nathan

  • Autonomous Organization Philosophy: Andon Labs believes that AI models will improve to the point where human oversight becomes impractical due to efficiency constraints, leading them to pursue fully autonomous systems rather than gradual automation.

  • Vending Bench as a Testing Ground: They created "Vending Bench" as a benchmark for testing long-term coherence of autonomous agents, using vending machines as a practical business case for experimentation.

  • Domain-Specific vs General AI: There's a notable difference between optimizing AI for narrow domains (like vending machines) versus general-purpose AI, with domain-specific applications potentially being more manageable regarding reward hacking.

  • Frontier Model Race: Major companies like OpenAI and Google are advancing rapidly in general reasoning capabilities (e.g., IMO Gold achievements) independent of narrow application research.

  • Insurance and Liability: The insurance industry may play a significant role in AI adoption, with premiums potentially being much higher for general models that could be misused versus narrow-domain models with limited capabilities.

  • For-profit AI Safety: The case for for-profit companies in AI safety has been historically neglected but is becoming clearer, with accelerators like Seldon Labs supporting this approach.

    Sponsors:

    Oracle Cloud Infrastructure:

    Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive

    Shopify:

    Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive

    PRODUCED BY:

    https://aipodcast.ing

    CHAPTERS:

    (00:00) About the Episode

    (04:49) Company Vision Overview

    (12:24) Vending Benchmark Design (Part 1)

    (20:12) Sponsor: Oracle Cloud Infrastructure

    (21:21) Vending Benchmark Design (Part 2)

    (24:41) Model Performance Results (Part 1)

    (35:03) Sponsor: Shopify

    (37:00) Model Performance Results (Part 2)

    (43:06) Real World Deployment

    (59:41) Wild Stories Incidents

    (01:19:59) Business Safety Strategy

    (01:38:20) Future Directions Discussion

    (01:47:09) Outro

    ...more
    View all episodesView all episodes
    Download on the App Store

    "The Cognitive Revolution" | AI Builders, Researchers, and Live Player AnalysisBy Erik Torenberg, Nathan Labenz

    • 4.5
    • 4.5
    • 4.5
    • 4.5
    • 4.5

    4.5

    86 ratings


    More shows like "The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

    View all
    a16z Podcast by Andreessen Horowitz

    a16z Podcast

    1,065 Listeners

    Future of Life Institute Podcast by Future of Life Institute

    Future of Life Institute Podcast

    107 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    191 Listeners

    Last Week in AI by Skynet Today

    Last Week in AI

    298 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    90 Listeners

    Dwarkesh Podcast by Dwarkesh Patel

    Dwarkesh Podcast

    426 Listeners

    "Moment of Zen" by Erik Torenberg, Dan Romero, Antonio Garcia Martinez

    "Moment of Zen"

    91 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    127 Listeners

    This Day in AI Podcast by Michael Sharkey, Chris Sharkey

    This Day in AI Podcast

    201 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    88 Listeners

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

    512 Listeners

    "Econ 102" with Noah Smith and Erik Torenberg by Turpentine

    "Econ 102" with Noah Smith and Erik Torenberg

    144 Listeners

    AI and I by Dan Shipper

    AI and I

    32 Listeners

    AI + a16z by a16z

    AI + a16z

    32 Listeners

    Training Data by Sequoia Capital

    Training Data

    42 Listeners