Share The harness, not the model — and the trust layer racing to catch up

Copy link

May 26, 2026

The harness, not the model — and the trust layer racing to catch up

24 minutes

One developer catching you up on the day in AI and the craft of building with it. Today: the wrapper around a model can move a benchmark more than the model does, a watermark goes multi-lab, and a decensoring tool with thirteen million downloads shows where that watermark leaks. Plus a sharp little essay on why coding agents make us so mad, the jobs data behind the panic, and three things you can pick up today.

The harness, not the model — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% while frontier models tie.
Gemini Omni — editing video by talking to it, with SynthID baked in (community reaction).
SynthID becomes a shared layer — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board.
Heretic in the Financial Times — decensoring open weights in ten minutes, and the artifact that proves the gap.
The user is visibly frustrated — why conversational agent UX trips your social wiring.
A rage-quitting modder and the jobs data — backlash, and what the numbers actually say.
The bench — NuExtract3, EAGLE 3.1, and a rejected llama.cpp patch worth grabbing.

...more

View all episodes

By Lenar Kess · Damra Vol

May 26, 2026

The harness, not the model — and the trust layer racing to catch up

24 minutes

The harness, not the model — a Google DeepMind Kaggle talk and an arXiv position paper argue the agent harness can swing a score ~22% while frontier models tie.
Gemini Omni — editing video by talking to it, with SynthID baked in (community reaction).
SynthID becomes a shared layer — 100 billion watermarks, Search and Chrome, and OpenAI/ElevenLabs/Kakao on board.
Heretic in the Financial Times — decensoring open weights in ten minutes, and the artifact that proves the gap.
The user is visibly frustrated — why conversational agent UX trips your social wiring.
A rage-quitting modder and the jobs data — backlash, and what the numbers actually say.
The bench — NuExtract3, EAGLE 3.1, and a rejected llama.cpp patch worth grabbing.

...more

Sign up to save your podcasts