
Sign up to save your podcasts
Or


In this episode of No-BS AI Briefing, host Vikash unpacks critical AI developments for founders, product managers, and engineering leaders. We dive into Microsoft Research's DELEGATE-52 study, which reveals that large language models degrade to roughly 50% accuracy on complex, multi-step tasks – a crucial finding for anyone building with AI agents. We also cover the new US federal AI litigation task force challenging state regulations, Malta's groundbreaking nationwide free ChatGPT Plus program in partnership with OpenAI, and the initiation of formal AI security dialogues between the US and China, with Anthropic's "Mythos" cited. The Vatican's new AI ethics study group and upcoming encyclical highlight the growing influence of moral institutions on AI development. Vikash offers a deep dive into the implications of the agent accuracy cliff and provides a practical takeaway: audit your agent workflows for failure points and integrate human-in-the-loop checks. Don't miss out on concise, actionable insights for staying ahead in AI – hit follow on the show!
Send us Fan Mail
Support the show
By VikashIn this episode of No-BS AI Briefing, host Vikash unpacks critical AI developments for founders, product managers, and engineering leaders. We dive into Microsoft Research's DELEGATE-52 study, which reveals that large language models degrade to roughly 50% accuracy on complex, multi-step tasks – a crucial finding for anyone building with AI agents. We also cover the new US federal AI litigation task force challenging state regulations, Malta's groundbreaking nationwide free ChatGPT Plus program in partnership with OpenAI, and the initiation of formal AI security dialogues between the US and China, with Anthropic's "Mythos" cited. The Vatican's new AI ethics study group and upcoming encyclical highlight the growing influence of moral institutions on AI development. Vikash offers a deep dive into the implications of the agent accuracy cliff and provides a practical takeaway: audit your agent workflows for failure points and integrate human-in-the-loop checks. Don't miss out on concise, actionable insights for staying ahead in AI – hit follow on the show!
Send us Fan Mail
Support the show