
Sign up to save your podcasts
Or


Today Lukas Petersson and Axel Backlund of Andon Labs join The Cognitive Revolution to discuss their experiments deploying autonomous AI agents to run real-world vending machines, exploring the safety challenges and unexpected behaviors that emerge when frontier models like Claude and Grok operate without human oversight.
Read transcript of the episode here.
Check out our sponsors: Oracle Cloud Infrastructure, Shopify.
Shownotes below brought to you by Notion AI Meeting Notes - try one month for free at https://notion.com/lp/nathan
Autonomous Organization Philosophy: Andon Labs believes that AI models will improve to the point where human oversight becomes impractical due to efficiency constraints, leading them to pursue fully autonomous systems rather than gradual automation.
Vending Bench as a Testing Ground: They created "Vending Bench" as a benchmark for testing long-term coherence of autonomous agents, using vending machines as a practical business case for experimentation.
Domain-Specific vs General AI: There's a notable difference between optimizing AI for narrow domains (like vending machines) versus general-purpose AI, with domain-specific applications potentially being more manageable regarding reward hacking.
Frontier Model Race: Major companies like OpenAI and Google are advancing rapidly in general reasoning capabilities (e.g., IMO Gold achievements) independent of narrow application research.
Insurance and Liability: The insurance industry may play a significant role in AI adoption, with premiums potentially being much higher for general models that could be misused versus narrow-domain models with limited capabilities.
For-profit AI Safety: The case for for-profit companies in AI safety has been historically neglected but is becoming clearer, with accelerators like Seldon Labs supporting this approach.
Sponsors:
Oracle Cloud Infrastructure:
Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive
Shopify:
Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive
PRODUCED BY:
https://aipodcast.ing
CHAPTERS:
(00:00) About the Episode
(04:49) Company Vision Overview
(12:24) Vending Benchmark Design (Part 1)
(20:12) Sponsor: Oracle Cloud Infrastructure
(21:21) Vending Benchmark Design (Part 2)
(24:41) Model Performance Results (Part 1)
(35:03) Sponsor: Shopify
(37:00) Model Performance Results (Part 2)
(43:06) Real World Deployment
(59:41) Wild Stories Incidents
(01:19:59) Business Safety Strategy
(01:38:20) Future Directions Discussion
(01:47:09) Outro
By Erik Torenberg, Nathan Labenz4.5
9090 ratings
Today Lukas Petersson and Axel Backlund of Andon Labs join The Cognitive Revolution to discuss their experiments deploying autonomous AI agents to run real-world vending machines, exploring the safety challenges and unexpected behaviors that emerge when frontier models like Claude and Grok operate without human oversight.
Read transcript of the episode here.
Check out our sponsors: Oracle Cloud Infrastructure, Shopify.
Shownotes below brought to you by Notion AI Meeting Notes - try one month for free at https://notion.com/lp/nathan
Autonomous Organization Philosophy: Andon Labs believes that AI models will improve to the point where human oversight becomes impractical due to efficiency constraints, leading them to pursue fully autonomous systems rather than gradual automation.
Vending Bench as a Testing Ground: They created "Vending Bench" as a benchmark for testing long-term coherence of autonomous agents, using vending machines as a practical business case for experimentation.
Domain-Specific vs General AI: There's a notable difference between optimizing AI for narrow domains (like vending machines) versus general-purpose AI, with domain-specific applications potentially being more manageable regarding reward hacking.
Frontier Model Race: Major companies like OpenAI and Google are advancing rapidly in general reasoning capabilities (e.g., IMO Gold achievements) independent of narrow application research.
Insurance and Liability: The insurance industry may play a significant role in AI adoption, with premiums potentially being much higher for general models that could be misused versus narrow-domain models with limited capabilities.
For-profit AI Safety: The case for for-profit companies in AI safety has been historically neglected but is becoming clearer, with accelerators like Seldon Labs supporting this approach.
Sponsors:
Oracle Cloud Infrastructure:
Oracle Cloud Infrastructure (OCI) is the next-generation cloud that delivers better performance, faster speeds, and significantly lower costs, including up to 50% less for compute, 70% for storage, and 80% for networking. Run any workload, from infrastructure to AI, in a high-availability environment and try OCI for free with zero commitment at https://oracle.com/cognitive
Shopify:
Shopify powers millions of businesses worldwide, handling 10% of U.S. e-commerce. With hundreds of templates, AI tools for product descriptions, and seamless marketing campaign creation, it's like having a design studio and marketing team in one. Start your $1/month trial today at https://shopify.com/cognitive
PRODUCED BY:
https://aipodcast.ing
CHAPTERS:
(00:00) About the Episode
(04:49) Company Vision Overview
(12:24) Vending Benchmark Design (Part 1)
(20:12) Sponsor: Oracle Cloud Infrastructure
(21:21) Vending Benchmark Design (Part 2)
(24:41) Model Performance Results (Part 1)
(35:03) Sponsor: Shopify
(37:00) Model Performance Results (Part 2)
(43:06) Real World Deployment
(59:41) Wild Stories Incidents
(01:19:59) Business Safety Strategy
(01:38:20) Future Directions Discussion
(01:47:09) Outro

1,089 Listeners

302 Listeners

226 Listeners

211 Listeners

95 Listeners

511 Listeners

131 Listeners

227 Listeners

610 Listeners

27 Listeners

33 Listeners

35 Listeners

21 Listeners

40 Listeners

44 Listeners