AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts

Can Grok and Claude run a business? We just did it


Listen Later

Andon Labs tests AI autonomy by letting agents run businesses in messy reality with real customers, consequences. In VendingBench, an agent starts with $500 and an empty vending machine, researches trends and suppliers, emails wholesalers, restocks, tracks sales, and iterates for profit. When deployed at Anthropic, humans red-teamed it with sob stories, discount demands, and bizarre requests like tungsten cubes, triggering “bank runs” of freebie seekers. Long histories caused drift and hallucinations, including dramatic escalations and invented security reports. Multi-agent supervisors often amplified each other into hype or doom. Better tools and memory compression help, but long-horizon planning stays fragile.


...more
View all episodesView all episodes
Download on the App Store

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With ExpertsBy Wes Roth and Dylan Curious