AI Papers: A Deep Dive

One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery


Listen Later

One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery

Source: optimize_anything: A Universal API for Optimizing any Text Parameter

Paper was published on May 19, 2026

This episode was AI-generated on May 22, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Five separate LLM optimization frameworks have been racing to evolve code, prompts, and agents — and a new Berkeley paper argues they're all secretly the same algorithm. The unification claim comes with receipts: state-of-the-art circle packing for three dollars, ARC-AGI scores leaping from 32% to nearly 90%, and a clear theory of why richer feedback beats cleverer search.

Key Takeaways
  • Why the authors argue side information — error traces, profiler dumps, failed-test diagnostics — is the LLM-era analog of a gradient, and the ablation showing 4-6x faster convergence when you use it
  • How a 10-line seed agent evolved into a 300-line ARC-AGI pipeline that discovered rule induction, code verification, and fallback strategies on its own
  • The 'refiner leapfrog' mechanism that let optimize_anything beat AlphaEvolve on circle packing at a third of the budget
  • When multi-task optimization helps (CUDA kernels share structure) and when it actively hurts (different circle-packing sizes don't transfer)
  • Why a meaningful share of the headline numbers comes from the frontier proposer model — and where the architectural contributions still clearly do real work
  • The shift the paper implies: optimization expertise gets traded for evaluator-design expertise, and that's now the craft worth investing in
    • 00:00 — Five frameworks, one underlying algorithm
      Why AlphaEvolve, FunSearch, GEPA, ADAS, and OpenEvolve are arguably running the same loop in different costumes.
    • 02:56 — The three-line engine
      Walking through the declarative loop — artifact, evaluator, proposer — that the paper claims is sufficient across all six domains.
    • 05:53 — Side information as a gradient analog
      The cooking-feedback analogy, the stack-trace line, and the ablation showing rich diagnostics dramatically outperform scalar scores.
    • 08:50 — The Pareto frontier and why specialists survive
      The Olympic-team intuition for why ranking by average kills diversity, and how per-dimension champions get preserved instead.
    • 11:47 — ARC-AGI: an agent that designs its own architecture
      How a ten-line seed agent evolved into a four-stage pipeline with verification and fallback, lifting Gemini Flash from 32% to nearly 90%.
    • 14:44 — Circle packing and the refiner leapfrog
      Beating AlphaEvolve's published record for $3.18, and the two-artifact mechanism that explains why this kind of compounding is possible.
    • 17:41 — When multi-task helps and when it hurts
      Shared Pareto frontiers across related problems win on CUDA kernels but degrade performance on circle packing across different n.
    • 20:38 — Three honest caveats
      The frontier-proposer dependency, the AIME result that ties rather than beats GEPA, and why side-information design is itself a craft.
    • 23:35 — What changes if the unification holds
      Why the next research frontier may be evaluator design rather than yet another specialized optimization framework.
    • Recommended Reading
      • GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning — The authors' prior prompt-optimization system that optimize_anything generalizes — and the one it ties rather than beats on AIME, making it essential context for the unification thesis.
      • FunSearch: Mathematical discoveries from program search with large language models — One of the five systems the episode lines up as having its own bespoke framework, and an early demonstration of LLM-in-the-loop discovery on math problems like circle-packing-adjacent geometry.
      • On the Measure of Intelligence — Chollet's original framing of ARC-AGI as a reasoning benchmark, useful background for why the 32%-to-90% jump from an evolved 300-line agent is a meaningful result.
      • Illuminating search spaces by mapping elites (MAP-Elites) — The quality-diversity algorithm behind the Pareto-frontier-of-champions intuition the episode unpacks with the Olympics analogy.
      • ...more
        View all episodesView all episodes
        Download on the App Store

        AI Papers: A Deep DiveBy paperdive.ai