
Sign up to save your podcasts
Or


What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/reverse-engineering-openai-o1
00:00 Reverse engineering OpenAI's o1
01:52 From Q-star to Strawberry to o1
05:13 Training o1 with reinforcement learning
09:24 What is o1 doing when given a prompt?
11:49 Questions to consider to understand o1's structure
11:56 1. How does an RL-trained language model act?
12:38 2. Is it an online / test-time search?
14:20 3. Is it one model at inference?
15:29 Open-source o1, the future of o1, and the future of AI
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_014.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_016.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_018.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_020.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_024.png
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_026.png
Fig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_034.png
Fig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_048.png
By Nathan Lambert4.1
99 ratings
What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/reverse-engineering-openai-o1
00:00 Reverse engineering OpenAI's o1
01:52 From Q-star to Strawberry to o1
05:13 Training o1 with reinforcement learning
09:24 What is o1 doing when given a prompt?
11:49 Questions to consider to understand o1's structure
11:56 1. How does an RL-trained language model act?
12:38 2. Is it an online / test-time search?
14:20 3. Is it one model at inference?
15:29 Open-source o1, the future of o1, and the future of AI
Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_014.png
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_016.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_018.png
Fig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_020.png
Fig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_024.png
Fig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_026.png
Fig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_034.png
Fig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_048.png

536 Listeners

1,105 Listeners

291 Listeners

212 Listeners

203 Listeners

313 Listeners

101 Listeners

551 Listeners

150 Listeners

101 Listeners

228 Listeners

147 Listeners

475 Listeners

34 Listeners

39 Listeners