Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year
Source: Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
Paper was published on May 22, 2026
This episode was AI-generated on May 25, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.
A new paper claims to compress nine to twelve months of expert verification work into about ten hours of compute — and the verified implementations it produces sometimes run three times faster than the hand-written references. The surprising reason: when an agent has to prove its code correct at every step, it gets pushed toward data representations that are both easier to verify and faster to run.
Key Takeaways
Why current coding agents like Codex and Claude Code solve only two of seven distributed key-value specs — and why pure sampling fails even with the formal spec in handHow the 'Admitted' keyword in Rocq enables incremental joint synthesis: every partial state of code and proof gets graded by the verifierThe Chapar case study where a forced pivot from a monolithic blob to per-key records simultaneously closes the proof and produces a 3x throughput win over the published referenceThe ablation that may be the paper's most informative result: replacing rich proof-state feedback with binary accept-reject collapses success rates from 93% to 58%Three honest limitations: the human still writes the spec, the evaluation covers only the consistency-correctness core of one domain, and generalization beyond distributed key-value stores is conjectured but not demonstratedWhy the deeper methodological shift — treating verification as a compute-driven search problem driven by an exact oracle — may matter more than any single performance number00:00 — Why a year is the baseline
What formal verification actually buys you, why it costs so much, and why testing can't substitute for it in distributed systems.03:33 — Why off-the-shelf agents fail
The empirical case that current coding agents can't produce verified distributed code, even with formal specs and a hundred sampling attempts.07:06 — The Admitted trick and incremental joint synthesis
How marking unfinished lemmas as IOUs turns verification into a step-by-step search the proof checker can grade at every move.10:40 — Three nested loops: tactical, strategic, and performance-driven
The architecture of inner deductive moves, outer strategic pivots, and an outermost loop that uses runtime benchmarks to steer the search.14:13 — The Chapar pivot
Watching the system abandon a monolithic state representation for per-key records and discover that the same choice makes the proof close and the code run 3x faster.27:39 — The feedback-texture ablation
Why rich proof-state feedback — not just having a verifier in the loop — appears to be doing most of the work.21:20 — Steelman and limitations
The spec is still human-written, the domain is one slice of distributed systems, and the performance comparisons are against verification artifacts rather than production-tuned systems.25:37 — The methodological reframe
Why turning verification from heroic expert labor into a compute-driven search could change which systems get verified at all.Recommended Reading
Let's Verify Step by Step — OpenAI's process-supervision work that bolsters the episode's claim that dense, step-level verifier feedback outperforms sparse end-of-task signals.