May 22, 2026

Why Web Agents Are Slow: A Compiler-Style Fix for Computer-Use Latency

26 minutes

Source: Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Paper was published on May 20, 2026

This episode was AI-generated on May 21, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Most of the time a web agent spends on your task is just the language model talking to itself, one screenshot at a time. A new Stanford paper argues that the slowness and brittleness of computer-use agents aren't a model problem at all — they're an architecture problem — and that decades-old compiler techniques can cut latency by ten times while improving accuracy.

Key Takeaways

Why current web agents act like interpreters running one LLM call per step, and what a 'compiler' for agents would do differently

How state-based preconditions and postconditions on cached tools eliminate roughly half of all agent failures before runtime

The worked Taco Bell example where one candidate plan uses an LLM to compare two numbers and another uses Python's min — a 50x cost difference for the same task

When 'hedge with four parallel browser sessions' actually beats serial execution, and why heavy-tailed click latencies make the math work

Why this approach only pays off for repeated workloads on the same apps, and where cache staleness could quietly erode the savings

The honest limitations: a 25–90 minute offline setup per app, an LLM-based prediction inside the scheduler, and a 37-task benchmark partly curated to exercise scheduling

00:00 — Agents as interpreters, and the compiler reframe
Reframing the user's request as source code that a compiler can plan, verify, and optimize before any browser action runs.

03:16 — State contracts and static verification
How preconditions and postconditions on cached tools let the planner reject invalid programs before runtime, cutting failures from 80% to 43%.

06:33 — The Taco Bell example and cost-based planning
Walking through three candidate plans where the cost model picks ordinary Python over an unnecessary LLM call, saving roughly 50x on the comparison step.

09:50 — Heavy tails and the case for hedging
Why running four parallel browser sessions and taking the first to finish can beat serial execution when individual UI clicks have high variance.

13:07 — Offline setup vs. online speedup
Separating the 25–90 minute per-app calibration phase from the user-facing task latency that produces the headline 10x number.

16:23 — Headline results and a fair baseline comparison
JIT-Planner's 10x speedup over Browser-Use, the 28-point accuracy gain, and the controlled comparison against a frontier model given the same cached tools.

19:40 — Where the approach is weakest
Honest objections around cache staleness, an LLM-predicted input to the scheduler, calibration of the cost model, and the small benchmark.

22:57 — The durable principle underneath
Why treating non-determinism as a deliberate choice rather than a default opens decades of compiler and systems research to agent design.