Neural intel Pod

Olmo 3: Unpacking the Fully Open LLM Flow (Dolma 3, OlmoRL, & State-of-the-Art Reasoning)


Listen Later

Join us for a deep technical discussion on Olmo 3, the latest family of state-of-the-art, fully open language models developed by the Olmo Team at the Allen Institute for AI (Ai2). Targeting the specialized audience of ML insiders, this episode dissects the entire model flow—a commitment to releasing the full lifecycle, including every stage, checkpoint, datapoint, and dependency used to build the models. This unprecedented transparency enables infinite customization and advancement in open-source AI research.Olmo 3 offers models at both the 7B and 32B parameter scales. We focus on how these models were engineered to excel across a diverse set of capabilities, including long context reasoning, function calling, coding, instruction following, general chat, and knowledge recall.Key technical highlights covered include:

The Model Lineup: We explore the Olmo 3 family, including Olmo 3 Base (Olmo-3-1025-7B, Olmo-3-1125-32B), the specialized Olmo 3 Think (trained for step-by-step reasoning and generating thinking traces), and Olmo 3 Instruct (optimized for general chat and inference efficiency). Notably, the flagship Olmo 3 Think-32B is the strongest fully open thinking model released to-date.

The Data Pipeline (Dolma & Dolci): We detail the sophisticated data mixing methodologies, including Dolma 3 Mix (5.9T tokens for pretraining), refined by Dolma 3 Dolmino Mix during the 100B token mid-training stage to boost capabilities in code and math. Post-training utilizes the new Dolci suite, providing tailored data for Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL).

Long-Context Engineering: Learn how Olmo 3 achieves 64K context through a newly added extension stage. This process incorporates high-quality data like olmOCR Science PDFs and utilizes techniques like YaRN positional embedding extension and specialized document packing.

Advanced Post-Training: We break down the three-stage process (SFT, DPO, RLVR) used for the Think and Instruct models. Discover the Delta Learning approach used in DPO to achieve capability gains by maximizing the contrast between chosen and rejected responses.

OlmoRL and RL-Zero: We examine OlmoRL, the improved RL training approach that generalizes verifiable reasoning across multiple domains (math, code, instruction following, general chat) and features crucial infrastructure advances (like asynchronous training and inflight updates). Plus, we cover the fully open Olmo 3 RL-Zero setup designed for rigorous RL algorithm benchmarking from a base model.Olmo 3 Base models outperform other fully open alternatives like Stanford Marin and Apertus, while the post-trained models are highly competitive with leading open-weight systems, often achieving strong results while training on roughly six times fewer tokens than competitors like Qwen 3 32B.

Keywords: LLM, Open Source AI, Olmo 3, Ai2, Model Flow, Technical Report, Machine Learning, Deep Learning, Transformer, Long Context, Reasoning, RLHF, DPO, RLVR, OlmoRL, Dolma, Dolci, 7B, 32B, Fine-Tuning, Deduplication, Compute-Efficiency, YaRN, Base Model, Thinking Model.

...more
View all episodesView all episodes
Download on the App Store

Neural intel PodBy Neuralintel.org