July 29, 2025

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

33 minutes

Join us for an insightful exploration into the world of Reasoning LLMs, drawing on the expertise of Sebastian Raschka, PhD. This episode demystifies how Large Language Models (LLMs) are being refined to excel at complex tasks that require intermediate steps, such as solving puzzles, advanced mathematics, and challenging coding problems, moving beyond simple factual question-answering.

We'll uncover the four main approaches currently used to build and improve these specialised reasoning capabilities:

Inference-time scaling: Discover how techniques like Chain-of-Thought (CoT) prompting encourage LLMs to generate intermediate reasoning steps, mimicking a 'thought process' and often leading to more accurate results on more complex problems. This approach increases computational resources during inference, making it more expensive.
Pure Reinforcement Learning (RL): Learn about the surprising emergence of reasoning behaviour from pure reinforcement learning, as demonstrated by DeepSeek-R1-Zero. This model was trained exclusively with RL, without an initial supervised fine-tuning (SFT) stage, using accuracy and format rewards to develop basic reasoning skills.
Supervised Fine-tuning (SFT) + Reinforcement Learning (RL): Understand this key approach for building high-performance reasoning models, exemplified by DeepSeek's flagship R1 model. This method refines models with additional SFT stages and further RL training, building upon "cold-started" pure RL models.
Pure SFT and Distillation: Explore how smaller, more efficient reasoning models can be created by instruction fine-tuning them on high-quality SFT data generated by larger, stronger LLMs. This approach is particularly attractive for creating models that are cheaper to run and can operate on lower-end hardware.

We'll also discuss when to use reasoning models – they are ideal for complex challenges but can be inefficient, more verbose, and expensive for simpler tasks, sometimes even being "prone to errors due to 'overthinking'". The episode provides valuable insights from the DeepSeek R1 pipeline as a detailed case study and touches upon comparisons with models like OpenAI's o1. Plus, get tips for developing reasoning models on a limited budget, including the promise of distillation and innovative methods like 'journey learning', which includes incorrect solution paths to teach models from mistakes. Tune in to navigate the rapidly evolving landscape of reasoning LLMs!

...more

View all episodes

By Ali Mehedi

July 29, 2025

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

33 minutes

We'll uncover the four main approaches currently used to build and improve these specialised reasoning capabilities:

Inference-time scaling: Discover how techniques like Chain-of-Thought (CoT) prompting encourage LLMs to generate intermediate reasoning steps, mimicking a 'thought process' and often leading to more accurate results on more complex problems. This approach increases computational resources during inference, making it more expensive.
Pure Reinforcement Learning (RL): Learn about the surprising emergence of reasoning behaviour from pure reinforcement learning, as demonstrated by DeepSeek-R1-Zero. This model was trained exclusively with RL, without an initial supervised fine-tuning (SFT) stage, using accuracy and format rewards to develop basic reasoning skills.
Supervised Fine-tuning (SFT) + Reinforcement Learning (RL): Understand this key approach for building high-performance reasoning models, exemplified by DeepSeek's flagship R1 model. This method refines models with additional SFT stages and further RL training, building upon "cold-started" pure RL models.
Pure SFT and Distillation: Explore how smaller, more efficient reasoning models can be created by instruction fine-tuning them on high-quality SFT data generated by larger, stronger LLMs. This approach is particularly attractive for creating models that are cheaper to run and can operate on lower-end hardware.

...more

Share Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Sign up to save your podcasts

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies

Mastering Reasoning LLMs: Decoding AI's Complex Problem-Solving Strategies