April 12, 2026

ASI-Evolve for Data, Architectures, and RL

This episode explores whether an agentic AI system can meaningfully improve AI itself across three hard parts of the pipeline: pretraining data curation, neural architecture search, and reinforcement learning algorithm design, using the paper ASI-Evolve as the focal point. It argues that this is a step beyond traditional AutoML, framing “AI-for-AI” as automating parts of the research loop itself—reading prior work, proposing changes, running experiments, interpreting noisy results, and deciding what to try next. The discussion highlights why this is difficult: real ML research involves expensive, delayed, and ambiguous feedback rather than clean benchmark-style signals, making claims of a unified framework especially significant and worth skepticism. Listeners would find it interesting for its clear breakdown of what makes autonomous AI research different from ordinary model assistance, and for its debate over whether recent systems are genuine progress toward automating frontier AI development or still mostly polished demos.

Sources:

1. ASI-Evolve: AI Accelerates AI — Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu, 2026

http://arxiv.org/abs/2603.29640

2. AutoML: A Survey of the State-of-the-Art — Xin He, Kaiyong Zhao, Xiaowen Chu, 2021

https://scholar.google.com/scholar?q=AutoML%3A+A+Survey+of+the+State-of-the-Art

3. Large Language Models as Optimizers — Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou and others, 2024

https://scholar.google.com/scholar?q=Large+Language+Models+as+Optimizers

4. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery — Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster and others, 2024

https://scholar.google.com/scholar?q=The+AI+Scientist%3A+Towards+Fully+Automated+Open-Ended+Scientific+Discovery

5. AlphaEvolve — Novikov et al., 2025

https://scholar.google.com/scholar?q=AlphaEvolve

6. Neural Architecture Search with Reinforcement Learning — Barret Zoph, Quoc V. Le, 2017

https://scholar.google.com/scholar?q=Neural+Architecture+Search+with+Reinforcement+Learning

7. Regularized Evolution for Image Classifier Architecture Search — Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le and others, 2019

https://scholar.google.com/scholar?q=Regularized+Evolution+for+Image+Classifier+Architecture+Search

8. DARTS: Differentiable Architecture Search — Hanxiao Liu, Karen Simonyan, Yiming Yang, 2019

https://scholar.google.com/scholar?q=DARTS%3A+Differentiable+Architecture+Search

9. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks — Mingxing Tan, Quoc V. Le, 2019

https://scholar.google.com/scholar?q=EfficientNet%3A+Rethinking+Model+Scaling+for+Convolutional+Neural+Networks

10. The Pile: An 800GB Dataset of Diverse Text for Language Modeling — Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster and others, 2020

https://scholar.google.com/scholar?q=The+Pile%3A+An+800GB+Dataset+of+Diverse+Text+for+Language+Modeling

11. What Language Model to Train if You Have One Million GPU Hours? — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford and others, 2022

https://scholar.google.com/scholar?q=What+Language+Model+to+Train+if+You+Have+One+Million+GPU+Hours%3F

12. FineWeb — Hugging Face researchers and collaborators, 2024

https://scholar.google.com/scholar?q=FineWeb

13. DCLM: DataComp for Language Models — DataComp-LM collaborators, 2024

https://scholar.google.com/scholar?q=DCLM%3A+DataComp+for+Language+Models

14. Discovering Reinforcement Learning Algorithms — Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine and others, 2021

https://scholar.google.com/scholar?q=Discovering+Reinforcement+Learning+Algorithms

15. Learned Optimizers that Scale and Generalize — Researchers from Google and collaborators, including Liyuan Liu, Andrew Dai, and others, 2022

https://scholar.google.com/scholar?q=Learned+Optimizers+that+Scale+and+Generalize

16. Proximal Policy Optimization Algorithms — John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017

https://scholar.google.com/scholar?q=Proximal+Policy+Optimization+Algorithms

17. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — DeepSeek-AI authors, 2024

https://scholar.google.com/scholar?q=DeepSeekMath%3A+Pushing+the+Limits+of+Mathematical+Reasoning+in+Open+Language+Models

18. AI Scientist — Lu et al., 2024

https://scholar.google.com/scholar?q=AI+Scientist

19. MLEvolve — Du et al., 2025

https://scholar.google.com/scholar?q=MLEvolve

20. GEPA — Unknown from excerpt, 2025

https://scholar.google.com/scholar?q=GEPA

21. OpenEvolve — Unknown from excerpt, 2025

https://scholar.google.com/scholar?q=OpenEvolve

22. DeltaNet — Yang et al., 2025

https://scholar.google.com/scholar?q=DeltaNet

23. Recent human-designed improvements over DeltaNet — Dao and Gu, 2024

https://scholar.google.com/scholar?q=Recent+human-designed+improvements+over+DeltaNet

24. GRPO — Guo et al., 2025

https://scholar.google.com/scholar?q=GRPO

25. MMLU — Hendrycks et al., 2021

https://scholar.google.com/scholar?q=MMLU

26. SciMaster — Chai et al., 2025

https://scholar.google.com/scholar?q=SciMaster

27. From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery — approx. recent survey, authors unclear from snippet, 2025

https://scholar.google.com/scholar?q=From+AI+for+Science+to+Agentic+Science%3A+A+Survey+on+Autonomous+Scientific+Discovery

28. DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=DiscoveryWorld%3A+A+Virtual+Environment+for+Developing+and+Evaluating+Automated+Scientific+Discovery+Agents

29. AI, Agentic Models and Lab Automation for Scientific Discovery—the Beginning of scAInce — approx. authors unclear from snippet, 2025

https://scholar.google.com/scholar?q=AI%2C+Agentic+Models+and+Lab+Automation+for+Scientific+Discovery%E2%80%94the+Beginning+of+scAInce

30. SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=SciAgents%3A+Automating+Scientific+Discovery+Through+Bioinspired+Multi-Agent+Intelligent+Graph+Reasoning

31. Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows — approx. authors unclear from snippet, 2025

https://scholar.google.com/scholar?q=Optimization+Problem+Solving+Can+Transition+to+Evolutionary+Agentic+Workflows

32. AVO: Agentic Variation Operators for Autonomous Evolutionary Search — approx. authors unclear from snippet, 2025

https://scholar.google.com/scholar?q=AVO%3A+Agentic+Variation+Operators+for+Autonomous+Evolutionary+Search

33. Toward Weight-level Self-improving Agents with Meta-knowledge Discovery — approx. authors unclear from snippet, 2025/2026

https://scholar.google.com/scholar?q=Toward+Weight-level+Self-improving+Agents+with+Meta-knowledge+Discovery

34. AI Post Transformers: Kimi Linear: Efficient Expressive Attention Architecture — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/kimi-linear-efficient-expressive-attention-architecture/

35. AI Post Transformers: Training-Free GRPO: Policy Optimization via Context Space — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/training-free-grpo-policy-optimization-via-context-space/

36. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/experiential-reinforcement-learning-internalizing-reflection-for-better-policy-t/

37. AI Post Transformers: Kosmos AI Scientist for Autonomous Discovery — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-04-04-kosmos-ai-scientist-for-autonomous-disco-311775.mp3

38. AI Post Transformers: HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/hypercontroller-fast-stable-reinforcement-learning-hyperparameter-optimization/

39. AI Post Transformers: NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/neurips-2025-reinforcement-learning-for-reasoning-in-large-language-models-with/

Interactive Visualization: ASI-Evolve for Data, Architectures, and RL

...more

View all episodes

By mcgrof