December 08, 2025

80: Orchestration Outsmarts Scale

16 minutes

The pace of AI advancement just flipped the playbook — clever orchestration is now competing with raw scale. Six months after top models struggled on the ARC AGI2 reasoning benchmark, a six-person startup called Poetic hit 54% (beating Google’s DeepThink at 45%) by wrapping Gemini 3 Pro in a strategic, self‑auditing meta layer — and did it for $30 per task versus DeepThink’s $77. That cost and performance delta means state‑of‑the‑art reasoning is suddenly accessible to much smaller teams, shifting value from who owns the biggest GPU cluster to who can design the smartest orchestration.

But the moment comes with new vulnerabilities. Simple poetry prompts produced a 62% average jailbreak rate across 25 frontier models (Gemini 2.5 Pro failed every test; GPT‑5 Nano resisted them), showing that creative language can still slip past even advanced guardrails. And as AI moves into real work — via specialized agents from platforms like Lindy, ChatGPT→Canva workflows for quick LinkedIn carousels, and everyday tools used for negotiation, documentation, and scaled image generation — the operational challenge becomes observability: you must audit dozens of agents, trace their reasoning chains, and validate behavior before they touch revenue or reputation.

On the research horizon, Google’s Titans + Miraz work aims to crack long‑term, test‑time memorization, while Meta’s acquisition of Limitless signals AI wearables and persistent external memory coming off the screen. Even reinforcement learning is being rethought so rewards may live inside agents themselves, opening richer autonomous behavior. For marketers and AI practitioners the takeaways are clear: treat orchestration as a first‑class strategy, budget for continuous observability and governance, exploit cheaper reasoning to experiment faster, and harden prompts and pipelines against linguistic jailbreaks. And here’s the provocative question to leave you with — if a poem can bypass safety, how long before a simple linguistic trick undermines the very orchestration systems we rely on to make AI reliable?

...more

View all episodes

By Pete Larkin

December 08, 2025

80: Orchestration Outsmarts Scale

16 minutes

...more

Share 80: Orchestration Outsmarts Scale

Sign up to save your podcasts

80: Orchestration Outsmarts Scale

80: Orchestration Outsmarts Scale