AI Post Transformers

ASI-Evolve for Data, Architectures, and RL


Listen Later

This episode explores whether an agentic AI system can meaningfully improve AI itself across three hard parts of the pipeline: pretraining data curation, neural architecture search, and reinforcement learning algorithm design, using the paper ASI-Evolve as the focal point. It argues that this is a step beyond traditional AutoML, framing “AI-for-AI” as automating parts of the research loop itself—reading prior work, proposing changes, running experiments, interpreting noisy results, and deciding what to try next. The discussion highlights why this is difficult: real ML research involves expensive, delayed, and ambiguous feedback rather than clean benchmark-style signals, making claims of a unified framework especially significant and worth skepticism. Listeners would find it interesting for its clear breakdown of what makes autonomous AI research different from ordinary model assistance, and for its debate over whether recent systems are genuine progress toward automating frontier AI development or still mostly polished demos.
Sources:
1. ASI-Evolve: AI Accelerates AI — Weixian Xu, Tiantian Mi, Yixiu Liu, Yang Nan, Zhimeng Zhou, Lyumanshan Ye, Lin Zhang, Yu Qiao, Pengfei Liu, 2026
http://arxiv.org/abs/2603.29640
2. AutoML: A Survey of the State-of-the-Art — Xin He, Kaiyong Zhao, Xiaowen Chu, 2021
https://scholar.google.com/scholar?q=AutoML:+A+Survey+of+the+State-of-the-Art
3. Large Language Models as Optimizers — Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou and others, 2024
https://scholar.google.com/scholar?q=Large+Language+Models+as+Optimizers
4. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery — Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster and others, 2024
https://scholar.google.com/scholar?q=The+AI+Scientist:+Towards+Fully+Automated+Open-Ended+Scientific+Discovery
5. AlphaEvolve — Novikov et al., 2025
https://scholar.google.com/scholar?q=AlphaEvolve
6. Neural Architecture Search with Reinforcement Learning — Barret Zoph, Quoc V. Le, 2017
https://scholar.google.com/scholar?q=Neural+Architecture+Search+with+Reinforcement+Learning
7. Regularized Evolution for Image Classifier Architecture Search — Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V. Le and others, 2019
https://scholar.google.com/scholar?q=Regularized+Evolution+for+Image+Classifier+Architecture+Search
8. DARTS: Differentiable Architecture Search — Hanxiao Liu, Karen Simonyan, Yiming Yang, 2019
https://scholar.google.com/scholar?q=DARTS:+Differentiable+Architecture+Search
9. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks — Mingxing Tan, Quoc V. Le, 2019
https://scholar.google.com/scholar?q=EfficientNet:+Rethinking+Model+Scaling+for+Convolutional+Neural+Networks
10. The Pile: An 800GB Dataset of Diverse Text for Language Modeling — Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster and others, 2020
https://scholar.google.com/scholar?q=The+Pile:+An+800GB+Dataset+of+Diverse+Text+for+Language+Modeling
11. What Language Model to Train if You Have One Million GPU Hours? — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford and others, 2022
https://scholar.google.com/scholar?q=What+Language+Model+to+Train+if+You+Have+One+Million+GPU+Hours?
12. FineWeb — Hugging Face researchers and collaborators, 2024
https://scholar.google.com/scholar?q=FineWeb
13. DCLM: DataComp for Language Models — DataComp-LM collaborators, 2024
https://scholar.google.com/scholar?q=DCLM:+DataComp+for+Language+Models
14. Discovering Reinforcement Learning Algorithms — Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine and others, 2021
https://scholar.google.com/scholar?q=Discovering+Reinforcement+Learning+Algorithms
15. Learned Optimizers that Scale and Generalize — Researchers from Google and collaborators, including Liyuan Liu, Andrew Dai, and others, 2022
https://scholar.google.com/scholar?q=Learned+Optimizers+that+Scale+and+Generalize
16. Proximal Policy Optimization Algorithms — John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, 2017
https://scholar.google.com/scholar?q=Proximal+Policy+Optimization+Algorithms
17. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models — DeepSeek-AI authors, 2024
https://scholar.google.com/scholar?q=DeepSeekMath:+Pushing+the+Limits+of+Mathematical+Reasoning+in+Open+Language+Models
18. AI Scientist — Lu et al., 2024
https://scholar.google.com/scholar?q=AI+Scientist
19. MLEvolve — Du et al., 2025
https://scholar.google.com/scholar?q=MLEvolve
20. GEPA — Unknown from excerpt, 2025
https://scholar.google.com/scholar?q=GEPA
21. OpenEvolve — Unknown from excerpt, 2025
https://scholar.google.com/scholar?q=OpenEvolve
22. DeltaNet — Yang et al., 2025
https://scholar.google.com/scholar?q=DeltaNet
23. Recent human-designed improvements over DeltaNet — Dao and Gu, 2024
https://scholar.google.com/scholar?q=Recent+human-designed+improvements+over+DeltaNet
24. GRPO — Guo et al., 2025
https://scholar.google.com/scholar?q=GRPO
25. MMLU — Hendrycks et al., 2021
https://scholar.google.com/scholar?q=MMLU
26. SciMaster — Chai et al., 2025
https://scholar.google.com/scholar?q=SciMaster
27. From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery — approx. recent survey, authors unclear from snippet, 2025
https://scholar.google.com/scholar?q=From+AI+for+Science+to+Agentic+Science:+A+Survey+on+Autonomous+Scientific+Discovery
28. DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents — approx. authors unclear from snippet, 2024/2025
https://scholar.google.com/scholar?q=DiscoveryWorld:+A+Virtual+Environment+for+Developing+and+Evaluating+Automated+Scientific+Discovery+Agents
29. AI, Agentic Models and Lab Automation for Scientific Discovery—the Beginning of scAInce — approx. authors unclear from snippet, 2025
https://scholar.google.com/scholar?q=AI,+Agentic+Models+and+Lab+Automation+for+Scientific+Discovery—the+Beginning+of+scAInce
30. SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning — approx. authors unclear from snippet, 2024/2025
https://scholar.google.com/scholar?q=SciAgents:+Automating+Scientific+Discovery+Through+Bioinspired+Multi-Agent+Intelligent+Graph+Reasoning
31. Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows — approx. authors unclear from snippet, 2025
https://scholar.google.com/scholar?q=Optimization+Problem+Solving+Can+Transition+to+Evolutionary+Agentic+Workflows
32. AVO: Agentic Variation Operators for Autonomous Evolutionary Search — approx. authors unclear from snippet, 2025
https://scholar.google.com/scholar?q=AVO:+Agentic+Variation+Operators+for+Autonomous+Evolutionary+Search
33. Toward Weight-level Self-improving Agents with Meta-knowledge Discovery — approx. authors unclear from snippet, 2025/2026
https://scholar.google.com/scholar?q=Toward+Weight-level+Self-improving+Agents+with+Meta-knowledge+Discovery
34. AI Post Transformers: Kimi Linear: Efficient Expressive Attention Architecture — Hal Turing & Dr. Ada Shannon, 2025
https://podcast.do-not-panic.com/episodes/kimi-linear-efficient-expressive-attention-architecture/
35. AI Post Transformers: Training-Free GRPO: Policy Optimization via Context Space — Hal Turing & Dr. Ada Shannon, 2025
https://podcast.do-not-panic.com/episodes/training-free-grpo-policy-optimization-via-context-space/
36. AI Post Transformers: Experiential Reinforcement Learning: Internalizing Reflection for Better Policy Training — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/experiential-reinforcement-learning-internalizing-reflection-for-better-policy-t/
37. AI Post Transformers: Kosmos AI Scientist for Autonomous Discovery — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-04-04-kosmos-ai-scientist-for-autonomous-disco-311775.mp3
38. AI Post Transformers: HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization — Hal Turing & Dr. Ada Shannon, 2025
https://podcast.do-not-panic.com/episodes/hypercontroller-fast-stable-reinforcement-learning-hyperparameter-optimization/
39. AI Post Transformers: NeurIPS 2025: Reinforcement Learning for Reasoning in Large Language Models with One Training Example — Hal Turing & Dr. Ada Shannon, 2025
https://podcast.do-not-panic.com/episodes/neurips-2025-reinforcement-learning-for-reasoning-in-large-language-models-with/
Interactive Visualization: ASI-Evolve for Data, Architectures, and RL
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof