June 05, 2026

Teaching a Phone Agent to Reason Silently, And Keeping It Honest

24 minutes

Source: MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models

Paper was published on June 03, 2026

This episode was AI-generated on June 4, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Good mobile AI agents write a paragraph of reasoning before every tap, which makes them smart but painfully slow. This episode unpacks MIRAGE, which moves that reasoning into silent hidden vectors, parallelizes it with a century-old numerical trick, and forces it to stay sharp by predicting the next screen, matching the quality of written reasoning at roughly a fifth of the cost.

Key Takeaways

Why stripping reasoning out of an agent doesn't just remove a bonus but actively drops it below the untouched base model (42.9 to 31)

How APLR borrows Jacobi iteration to parallelize sequential latent reasoning with a provable guarantee that the first K thought-slots are exact

The trick that keeps invisible reasoning honest: a throwaway 'world model' head that forces the silent slots to predict the next screen's features during training only

How the ablation table tells the whole thesis in five numbers, with the world model recovering the chain-of-thought score (52.6) to the decimal

Where the headline 'matches chain-of-thought' claim is fragile: it rests on a tie at a single benchmark number, and the slot-specialization story is shown correlationally, not proven

Why the latent scratchpad isn't free, dropping from nine slots to three craters success from 52.6 to 32.8

00:00 — The cost of agents that narrate every tap
Why step-by-step reasoning helps mobile agents but makes each action slow and verbose, and what MIRAGE claims to fix.

03:01 — Reasoning without words
How a model can think in continuous hidden vectors instead of generating text, building on the earlier Coconut approach.

06:02 — APLR and the Jacobi iteration trick
Using the one-way dependency structure of causal attention to parallelize latent reasoning with a provable correctness guarantee.

09:03 — The world model that keeps silent reasoning honest
A lightweight head that forces the under-supervised thought-slots to predict next-screen features during training, then gets discarded at inference.

12:04 — Two-stage training and why ordering matters
First teaching the shape of good reasoning out loud, then migrating it into silent latent slots.

15:05 — The ablation table, five numbers that carry the argument
Walking through the AndroidWorld results from removing reasoning entirely up to full MIRAGE recovering the chain-of-thought score.

18:06 — Where the claims are fragile
Steelman critiques on the single-number tie, the correlational slot-specialization story, and what 'world model' really means here.

21:07 — What travels beyond phones
The reframe of where reasoning should live and why the parallelization trick should generalize to other causal computations.