Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency
Source: Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling
Paper was published on May 20, 2026
This episode was AI-generated on May 21, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.
Most of the time a web agent spends on your task is just the language model talking to itself, one screenshot at a time. A new Stanford paper argues that the slowness and brittleness of computer-use agents aren't a model problem at all — they're an architecture problem — and that decades-old compiler techniques can cut latency by ten times while improving accuracy.
Key Takeaways
Why current web agents act like interpreters running one LLM call per step, and what a 'compiler' for agents would do differentlyHow state-based preconditions and postconditions on cached tools eliminate roughly half of all agent failures before runtimeThe worked Taco Bell example where one candidate plan uses an LLM to compare two numbers and another uses Python's min — a 50x cost difference for the same taskWhen 'hedge with four parallel browser sessions' actually beats serial execution, and why heavy-tailed click latencies make the math workWhy this approach only pays off for repeated workloads on the same apps, and where cache staleness could quietly erode the savingsThe honest limitations: a 25–90 minute offline setup per app, an LLM-based prediction inside the scheduler, and a 37-task benchmark partly curated to exercise scheduling00:00 — Agents as interpreters, and the compiler reframe
Reframing the user's request as source code that a compiler can plan, verify, and optimize before any browser action runs.03:16 — State contracts and static verification
How preconditions and postconditions on cached tools let the planner reject invalid programs before runtime, cutting failures from 80% to 43%.06:33 — The Taco Bell example and cost-based planning
Walking through three candidate plans where the cost model picks ordinary Python over an unnecessary LLM call, saving roughly 50x on the comparison step.09:50 — Heavy tails and the case for hedging
Why running four parallel browser sessions and taking the first to finish can beat serial execution when individual UI clicks have high variance.13:07 — Offline setup vs. online speedup
Separating the 25–90 minute per-app calibration phase from the user-facing task latency that produces the headline 10x number.16:23 — Headline results and a fair baseline comparison
JIT-Planner's 10x speedup over Browser-Use, the 28-point accuracy gain, and the controlled comparison against a frontier model given the same cached tools.19:40 — Where the approach is weakest
Honest objections around cache staleness, an LLM-predicted input to the scheduler, calibration of the cost model, and the small benchmark.22:57 — The durable principle underneath
Why treating non-determinism as a deliberate choice rather than a default opens decades of compiler and systems research to agent design.Recommended Reading
Model Context Protocol — The current standard for typing tool inputs that the episode's state-contract proposal extends from input types to full browser-state preconditions and postconditions.ReAct: Synergizing Reasoning and Acting in Language Models — The canonical interpreter-style agent loop the episode critiques as 'screenshot, ask the LLM, click, repeat' — useful background for understanding what JIT compilation is replacing.Voyager: An Open-Ended Embodied Agent with Large Language Models — An earlier example of an agent that synthesizes and caches reusable skills from experience, which parallels the offline tool-synthesis phase the episode flags as the load-bearing amortization assumption.