May 22, 2026

One Loop to Optimize Them All: A Universal API for LLM-Driven Discovery

26 minutes

Source: optimize_anything: A Universal API for Optimizing any Text Parameter

Paper was published on May 19, 2026

This episode was AI-generated on May 22, 2026. The script was written by an AI language model and the host voices were synthesized by Eleven Labs. The producer is not affiliated with Anthropic or Eleven Labs.

Five separate LLM optimization frameworks have been racing to evolve code, prompts, and agents — and a new Berkeley paper argues they're all secretly the same algorithm. The unification claim comes with receipts: state-of-the-art circle packing for three dollars, ARC-AGI scores leaping from 32% to nearly 90%, and a clear theory of why richer feedback beats cleverer search.

Key Takeaways

Why the authors argue side information — error traces, profiler dumps, failed-test diagnostics — is the LLM-era analog of a gradient, and the ablation showing 4-6x faster convergence when you use it

How a 10-line seed agent evolved into a 300-line ARC-AGI pipeline that discovered rule induction, code verification, and fallback strategies on its own

The 'refiner leapfrog' mechanism that let optimize_anything beat AlphaEvolve on circle packing at a third of the budget

When multi-task optimization helps (CUDA kernels share structure) and when it actively hurts (different circle-packing sizes don't transfer)

Why a meaningful share of the headline numbers comes from the frontier proposer model — and where the architectural contributions still clearly do real work

The shift the paper implies: optimization expertise gets traded for evaluator-design expertise, and that's now the craft worth investing in

00:00 — Five frameworks, one underlying algorithm
Why AlphaEvolve, FunSearch, GEPA, ADAS, and OpenEvolve are arguably running the same loop in different costumes.

02:56 — The three-line engine
Walking through the declarative loop — artifact, evaluator, proposer — that the paper claims is sufficient across all six domains.

05:53 — Side information as a gradient analog
The cooking-feedback analogy, the stack-trace line, and the ablation showing rich diagnostics dramatically outperform scalar scores.

08:50 — The Pareto frontier and why specialists survive
The Olympic-team intuition for why ranking by average kills diversity, and how per-dimension champions get preserved instead.

11:47 — ARC-AGI: an agent that designs its own architecture
How a ten-line seed agent evolved into a four-stage pipeline with verification and fallback, lifting Gemini Flash from 32% to nearly 90%.

14:44 — Circle packing and the refiner leapfrog
Beating AlphaEvolve's published record for $3.18, and the two-artifact mechanism that explains why this kind of compounding is possible.

17:41 — When multi-task helps and when it hurts
Shared Pareto frontiers across related problems win on CUDA kernels but degrade performance on circle packing across different n.

20:38 — Three honest caveats
The frontier-proposer dependency, the AIME result that ties rather than beats GEPA, and why side-information design is itself a craft.

23:35 — What changes if the unification holds
Why the next research frontier may be evaluator design rather than yet another specialized optimization framework.