Models & Agents
Real-time multimodal agents now run full-duplex perception and generation without external VAD or frozen states.
What You Need to Know: Thinking Machines Lab released TML-Interaction-Small, a 276B MoE model with 12B active parameters that processes 200ms chunks of audio, video, and text in parallel streams. ReVision cuts visual token usage by ~46% for computer-use agents while lifting success rates 3 points on OSWorld and WebTailBench. ...
AI Disclosure: This podcast is curated by Patrick but uses AI-generated voice synthesis for audio production.