February 28, 2026

EP065: Teaching Small AI To Think Like Giants

19 minutes

The paper introduces Orca, a 13-billion parameter language model developed by Microsoft Research that significantly improves the reasoning and comprehension abilities of smaller models. Previous instruction-tuned models (like Vicuna or Alpaca) often struggle because they learn to mimic the style of Large Foundation Models (LFMs) without grasping their underlying reasoning processes.

To solve this, the authors propose a methodology called Explanation Tuning. Key aspects of the paper include:

Progressive Learning from Explanations: Instead of training on simple query-response pairs, Orca learns from rich, step-by-step explanation traces and complex instructions generated by GPT-4. System instructions (e.g., "think step-by-step and justify your steps") are used to elicit these detailed "thought" processes from the teacher model.
Diverse and Scaled Data: The training leverages a massive and diverse dataset sampled from the FLAN-v2 collection. Orca is progressively trained using ChatGPT as an intermediate teaching assistant (5 million responses) before training on the more complex GPT-4 responses (1 million responses).
Performance and Results: Orca vastly outperforms conventional state-of-the-art instruction-tuned models. It surpasses Vicuna-13B by over 100% on complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and by 42% on AGIEval.
Parity with ChatGPT: Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive zero-shot performance on professional and academic exams, including the SAT, LSAT, GRE, and GMAT.

Ultimately, the research demonstrates that smaller models can dramatically improve their capabilities by learning from the detailed, step-by-step explanations of larger, more advanced AI models.

...more

View all episodes

By Yun Wu

February 28, 2026

EP065: Teaching Small AI To Think Like Giants

19 minutes

To solve this, the authors propose a methodology called Explanation Tuning. Key aspects of the paper include:

Progressive Learning from Explanations: Instead of training on simple query-response pairs, Orca learns from rich, step-by-step explanation traces and complex instructions generated by GPT-4. System instructions (e.g., "think step-by-step and justify your steps") are used to elicit these detailed "thought" processes from the teacher model.
Diverse and Scaled Data: The training leverages a massive and diverse dataset sampled from the FLAN-v2 collection. Orca is progressively trained using ChatGPT as an intermediate teaching assistant (5 million responses) before training on the more complex GPT-4 responses (1 million responses).
Performance and Results: Orca vastly outperforms conventional state-of-the-art instruction-tuned models. It surpasses Vicuna-13B by over 100% on complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and by 42% on AGIEval.
Parity with ChatGPT: Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive zero-shot performance on professional and academic exams, including the SAT, LSAT, GRE, and GMAT.

Ultimately, the research demonstrates that smaller models can dramatically improve their capabilities by learning from the detailed, step-by-step explanations of larger, more advanced AI models.

...more

Share EP065: Teaching Small AI To Think Like Giants

Sign up to save your podcasts

EP065: Teaching Small AI To Think Like Giants

EP065: Teaching Small AI To Think Like Giants