April 04, 2026

Kosmos AI Scientist for Autonomous Discovery

38 minutes

This episode explores a 2025 paper on “Kosmos,” an AI scientist designed to carry out long-horizon research by combining literature search, hypothesis generation, code-based data analysis, and persistent memory. The discussion argues that the real innovation is not a smarter standalone language model, but a software architecture that uses agentic workflows and a structured “world model” to preserve evidence, hypotheses, and task state across many steps. It also clarifies key distinctions often blurred in AI discourse, separating AI for scientific discovery from standard deep learning, and distinguishing this kind of world model from the latent simulators used in reinforcement learning. Listeners would find it interesting for its grounded look at what it would actually take for AI to function like a junior computational scientist—and where the genuine advances may lie beyond hype.

Sources:

1. Kosmos: An AI Scientist for Autonomous Discovery — Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari, Eric C. Landsness, Daniel L. Barabasi, Siddharth Narayanan, Nicky Evans, Shriya Reddy, Martha Foiani, Aizad Kamal, Leah P. Shriver, Fang Cao, Asmamaw T. Wassie, Jon M. Laurent, Edwin Melville-Green, Mayk Caldas, Albert Bou, Kaleigh F. Roberts, Sladjana Zagorac, Timothy C. Orr, Miranda E. Orr, Kevin J. Zwezdaryk, Ali E. Ghareeb, Laurie McCoy, Bruna Gomes, Euan A. Ashley, Karen E. Duff, Tonio Buonassisi, Tom Rainforth, Randall J. Bateman, Michael Skarlinski, Samuel G. Rodriques, Michaela M. Hinks, Andrew D. White, 2025

http://arxiv.org/abs/2511.02824

2. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery — Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, David Ha and collaborators at Sakana AI, 2024

https://scholar.google.com/scholar?q=The+AI+Scientist%3A+Towards+Fully+Automated+Open-Ended+Scientific+Discovery

3. Towards an AI co-scientist — Google Research collaborators including teams working on Gemini-based scientific reasoning systems, 2025

https://scholar.google.com/scholar?q=Towards+an+AI+co-scientist

4. Robin: an agentic system for automating scientific discovery in therapeutics — Andrew D. White, Samuel G. Rodriques and collaborators, 2024

https://scholar.google.com/scholar?q=Robin%3A+an+agentic+system+for+automating+scientific+discovery+in+therapeutics

5. Autonomous chemical research with large language models — Various groups; a representative line includes LLM-driven chemistry agents integrating planning, literature, and lab or simulation tools, 2023-2025

https://scholar.google.com/scholar?q=Autonomous+chemical+research+with+large+language+models

6. Robin — Not fully specified in the excerpt; cited as [1] and described as the authors' previous system, Unknown from excerpt

https://scholar.google.com/scholar?q=Robin

7. The AI Scientist — Sakana AI team; cited as [2], Likely 2024

https://scholar.google.com/scholar?q=The+AI+Scientist

8. AI co-scientist — Google team; cited as [3], Likely 2025

https://scholar.google.com/scholar?q=AI+co-scientist

9. Virtual Lab — Cited as [4]; exact authors not given in excerpt, Unknown from excerpt

https://scholar.google.com/scholar?q=Virtual+Lab

10. Edison Scientific data analysis agent — Cited as [5]; exact authors not given in excerpt, Unknown from excerpt

https://scholar.google.com/scholar?q=Edison+Scientific+data+analysis+agent

11. Edison Scientific literature search agent — Cited as [6]; exact authors not given in excerpt, Unknown from excerpt

https://scholar.google.com/scholar?q=Edison+Scientific+literature+search+agent

12. Planner Matters! An Efficient and Memory-Augmented Multi-agent Framework for Long-horizon GUI Planning — approx. recent multi-agent/planning paper, authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=Planner+Matters%21+An+Efficient+and+Memory-Augmented+Multi-agent+Framework+for+Long-horizon+GUI+Planning

13. Memory-Driven Agent Planning for Long-Horizon Tasks via Hierarchical Encoding and Dynamic Retrieval — approx. recent agent-memory paper, authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=Memory-Driven+Agent+Planning+for+Long-Horizon+Tasks+via+Hierarchical+Encoding+and+Dynamic+Retrieval

14. Optimus-1: Hybrid multimodal memory empowered agents excel in long-horizon tasks — approx. Optimus-1 authors, exact names unclear, 2024

https://scholar.google.com/scholar?q=Optimus-1%3A+Hybrid+multimodal+memory+empowered+agents+excel+in+long-horizon+tasks

15. Hallucination mitigation for retrieval-augmented large language models: a review — approx. review authors unclear, 2024/2025

https://scholar.google.com/scholar?q=Hallucination+mitigation+for+retrieval-augmented+large+language+models%3A+a+review

16. Grounding fallacies misrepresenting scientific publications in evidence — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=Grounding+fallacies+misrepresenting+scientific+publications+in+evidence

17. Zero-shot scientific claim verification using LLMs and citation text — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=Zero-shot+scientific+claim+verification+using+LLMs+and+citation+text

18. Learning fine-grained grounded citations for attributed large language models — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=Learning+fine-grained+grounded+citations+for+attributed+large+language+models

19. The cost of dynamic reasoning: Demystifying AI agents and test-time scaling from an AI infrastructure perspective — approx. authors unclear from snippet, 2025

https://scholar.google.com/scholar?q=The+cost+of+dynamic+reasoning%3A+Demystifying+AI+agents+and+test-time+scaling+from+an+AI+infrastructure+perspective

20. The illusion of diminishing returns: Measuring long horizon execution in LLMs — approx. authors unclear from snippet, 2024/2025

https://scholar.google.com/scholar?q=The+illusion+of+diminishing+returns%3A+Measuring+long+horizon+execution+in+LLMs

21. AI Post Transformers: Agentic AI and the Next Intelligence Explosion — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-28-agentic-ai-and-the-next-intelligence-exp-d06561.mp3

22. AI Post Transformers: Mem0: Scalable Long-Term Memory for AI Agents — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/mem0-scalable-long-term-memory-for-ai-agents/

23. AI Post Transformers: LeWorldModel: Stable Joint-Embedding World Models from Pixels — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-25-leworldmodel-stable-joint-embedding-worl-650f9f.mp3

24. AI Post Transformers: Hallucination to Truth: A Review of Fact-Checking and Factuality Evaluation in Large Language Model — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/hallucination-to-truth-a-review-of-fact-checking-and-factuality-evaluation-in-la/

25. AI Post Transformers: MetaGraph: knowledge graphs from financial NLP — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/metagraph-knowledge-graphs-from-financial-nlp/

26. AI Post Transformers: Survey of Emerging Topics in AI and Robotics — Hal Turing & Dr. Ada Shannon, 2025

https://podcast.do-not-panic.com/episodes/survey-of-emerging-topics-in-ai-and-robotics/

27. AI Post Transformers: The Endless Gym: Training Terminal Agents — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/the-endless-gym-training-terminal-agents/

28. AI Post Transformers: Bloom: an open source tool for automated behavioral evaluations — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/bloom-an-open-source-tool-for-automated-behavioral-evaluations/

Interactive Visualization: Kosmos AI Scientist for Autonomous Discovery

...more

View all episodes

By mcgrof