AI Papers: A Deep Dive

Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year


Listen Later

Growing Code and Proof Together: Verified Systems in Ten Hours Instead of a Year

Source: Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

Paper was published on May 22, 2026

This episode was AI-generated on May 25, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

A new paper claims to compress nine to twelve months of expert verification work into about ten hours of compute — and the verified implementations it produces sometimes run three times faster than the hand-written references. The surprising reason: when an agent has to prove its code correct at every step, it gets pushed toward data representations that are both easier to verify and faster to run.

Key Takeaways
  • Why current coding agents like Codex and Claude Code solve only two of seven distributed key-value specs — and why pure sampling fails even with the formal spec in hand
  • How the 'Admitted' keyword in Rocq enables incremental joint synthesis: every partial state of code and proof gets graded by the verifier
  • The Chapar case study where a forced pivot from a monolithic blob to per-key records simultaneously closes the proof and produces a 3x throughput win over the published reference
  • The ablation that may be the paper's most informative result: replacing rich proof-state feedback with binary accept-reject collapses success rates from 93% to 58%
  • Three honest limitations: the human still writes the spec, the evaluation covers only the consistency-correctness core of one domain, and generalization beyond distributed key-value stores is conjectured but not demonstrated
  • Why the deeper methodological shift — treating verification as a compute-driven search problem driven by an exact oracle — may matter more than any single performance number
    • 00:00 — Why a year is the baseline
      What formal verification actually buys you, why it costs so much, and why testing can't substitute for it in distributed systems.
    • 03:33 — Why off-the-shelf agents fail
      The empirical case that current coding agents can't produce verified distributed code, even with formal specs and a hundred sampling attempts.
    • 07:06 — The Admitted trick and incremental joint synthesis
      How marking unfinished lemmas as IOUs turns verification into a step-by-step search the proof checker can grade at every move.
    • 10:40 — Three nested loops: tactical, strategic, and performance-driven
      The architecture of inner deductive moves, outer strategic pivots, and an outermost loop that uses runtime benchmarks to steer the search.
    • 14:13 — The Chapar pivot
      Watching the system abandon a monolithic state representation for per-key records and discover that the same choice makes the proof close and the code run 3x faster.
    • 27:39 — The feedback-texture ablation
      Why rich proof-state feedback — not just having a verifier in the loop — appears to be doing most of the work.
    • 21:20 — Steelman and limitations
      The spec is still human-written, the domain is one slice of distributed systems, and the performance comparisons are against verification artifacts rather than production-tuned systems.
    • 25:37 — The methodological reframe
      Why turning verification from heroic expert labor into a compute-driven search could change which systems get verified at all.
    • Recommended Reading
      • Let's Verify Step by Step — OpenAI's process-supervision work that bolsters the episode's claim that dense, step-level verifier feedback outperforms sparse end-of-task signals.
      • ...more
        View all episodesView all episodes
        Download on the App Store

        AI Papers: A Deep DiveBy paperdive.ai