AI Post Transformers

DreamerV3 World Models Across 150 Tasks


Listen Later

This episode explores DreamerV3, a world-model reinforcement learning system that claims to use one main configuration across more than 150 tasks spanning Atari, ProcGen, DMLab, robot control, visual control, BSuite, and Minecraft. It explains how world models work—learning compact environment dynamics so an agent can train on imagined futures—and why that approach is appealing for sample efficiency but historically difficult because agents can overfit to inaccurate “fantasy” dynamics. The discussion highlights the paper’s central argument that robust world-model design may reduce the need for domain-specific retuning, while also stressing that “fixed hyperparameters” does not eliminate all domain engineering such as wrappers, action discretization, and evaluation choices. Listeners would find it interesting for its clear look at a major RL unification attempt, including why the results matter for scaling, sparse-reward tasks, and expensive real-world settings like robotics.
Sources:
1. Mastering Diverse Domains through World Models — Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap, 2023
http://arxiv.org/abs/2301.04104
2. Mastering Atari with Discrete World Models — Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba, 2021
https://scholar.google.com/scholar?q=Mastering+Atari+with+Discrete+World+Models
3. Mastering Visual Continuous Control: Improved Data-Efficient Reinforcement Learning with Dreamer — Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson, 2020
https://scholar.google.com/scholar?q=Mastering+Visual+Continuous+Control:+Improved+Data-Efficient+Reinforcement+Learning+with+Dreamer
4. Learning Latent Dynamics for Planning from Pixels — Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi, 2019
https://scholar.google.com/scholar?q=Learning+Latent+Dynamics+for+Planning+from+Pixels
5. MuZero — Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, et al., 2020
https://scholar.google.com/scholar?q=MuZero
6. IRIS: Efficient Video Pretraining for Reinforcement Learning — various authors as cited by the paper, 2023
https://scholar.google.com/scholar?q=IRIS:+Efficient+Video+Pretraining+for+Reinforcement+Learning
7. Temporal Difference Models / TD-MPC / TD-MPC2 — various authors including Nicklas Hansen and colleagues, 2022-2024
https://scholar.google.com/scholar?q=Temporal+Difference+Models+/+TD-MPC+/+TD-MPC2
8. MineRL BASALT / VPT-related Minecraft works — various authors including OpenAI and MineRL participants, 2021-2022
https://scholar.google.com/scholar?q=MineRL+BASALT+/+VPT-related+Minecraft+works
9. DrQ-v2 — Ilya Kostrikov, Denis Yarats, Rob Fergus, 2021
https://scholar.google.com/scholar?q=DrQ-v2
10. R2D2 — Steven Kapturowski, Georg Ostrovski, John Quan, et al., 2019
https://scholar.google.com/scholar?q=R2D2
11. STORM: Efficient Stochastic Transformer-based World Models for Reinforcement Learning — approx. Guo et al., 2023/2024
https://scholar.google.com/scholar?q=STORM:+Efficient+Stochastic+Transformer-based+World+Models+for+Reinforcement+Learning
12. Improving Transformer World Models for Data-Efficient RL — approx. recent 2023/2024 RL world-model authors, 2023/2024
https://scholar.google.com/scholar?q=Improving+Transformer+World+Models+for+Data-Efficient+RL
13. GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control — approx. recent MBRL authors, 2024/2025
https://scholar.google.com/scholar?q=GIRL:+Generative+Imagination+Reinforcement+Learning+via+Information-Theoretic+Hallucination+Control
14. Normalization Enhances Generalization in Visual Reinforcement Learning — approx. recent visual RL authors, 2024/2025
https://scholar.google.com/scholar?q=Normalization+Enhances+Generalization+in+Visual+Reinforcement+Learning
15. Understanding the Mechanisms of Fast Hyperparameter Transfer — approx. recent hyperparameter-transfer authors, 2024/2025
https://scholar.google.com/scholar?q=Understanding+the+Mechanisms+of+Fast+Hyperparameter+Transfer
16. Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration — approx. recent hyperparameter-transfer authors, 2024/2025
https://scholar.google.com/scholar?q=Completed+Hyperparameter+Transfer+across+Modules,+Width,+Depth,+Batch+and+Duration
17. AI Post Transformers: LeWorldModel: Stable Joint-Embedding World Models from Pixels — Hal Turing & Dr. Ada Shannon, 2026
https://podcast.do-not-panic.com/episodes/2026-03-25-leworldmodel-stable-joint-embedding-worl-650f9f.mp3
18. AI Post Transformers: Zero-Shot Context Generalization in Reinforcement Learning from Few Training Contexts — Hal Turing & Dr. Ada Shannon, Tue,
https://podcast.do-not-panic.com/episodes/zero-shot-context-generalization-in-reinforcement-learning-from-few-training-con/
19. AI Post Transformers: Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning — Hal Turing & Dr. Ada Shannon, Fri,
https://podcast.do-not-panic.com/episodes/contrastive-behavioral-similarity-embeddings-for-generalization-in-reinforcement/
20. AI Post Transformers: HyperController: Fast, Stable Reinforcement Learning Hyperparameter Optimization — Hal Turing & Dr. Ada Shannon, Fri,
https://podcast.do-not-panic.com/episodes/hypercontroller-fast-stable-reinforcement-learning-hyperparameter-optimization/
Interactive Visualization: DreamerV3 World Models Across 150 Tasks
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof