March 09, 2026

EP115: Dr.LLM brings dynamic depth to AI

23 minutes

The paper introduces Dr.LLM (Dynamic Routing of Layers for LLMs), a retrofittable framework designed to improve both the efficiency and accuracy of Large Language Models (LLMs) without altering their base weights.

Typically, LLMs process every token through a fixed stack of transformer layers, which wastes computation on simple queries and lacks the depth needed for complex reasoning. While prior adaptive-depth methods have attempted to address this, they often degrade accuracy, require expensive inference-time searches, or demand large-scale retraining and architectural changes.

Dr.LLM overcomes these limitations by equipping a frozen, pretrained LLM with lightweight, per-layer routers that dynamically decide whether to skip, execute, or repeat a specific transformer block.

Key highlights of the paper include:

Methodology: The routers are trained using explicit supervision derived from an offline Monte Carlo Tree Search (MCTS). The MCTS discovers optimal execution paths that preserve or improve accuracy under a compute budget, creating a compact dataset of 4,000 examples to train the routers.
Design: To ensure stable routing decisions on long contexts and manage class imbalances, Dr.LLM utilizes windowed mean-pooling and focal loss with class-rebalancing weights.
In-Domain Results: On reasoning-heavy tasks like ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4 percentage points while saving an average of 5 layers of computation per example.
Out-of-Domain Robustness: The trained routers generalize well to out-of-domain tasks (such as MMLU, GSM8k, and TruthfulQA) with only a minimal 0.85 percentage point drop in accuracy while retaining their computational efficiency.

Overall, Dr.LLM successfully demonstrates that explicitly supervised routing can retrofit frozen LLMs to achieve budget-aware, accuracy-driven inference.

...more

View all episodes

By Yun Wu

March 09, 2026

EP115: Dr.LLM brings dynamic depth to AI

23 minutes

Key highlights of the paper include:

Methodology: The routers are trained using explicit supervision derived from an offline Monte Carlo Tree Search (MCTS). The MCTS discovers optimal execution paths that preserve or improve accuracy under a compute budget, creating a compact dataset of 4,000 examples to train the routers.
Design: To ensure stable routing decisions on long contexts and manage class imbalances, Dr.LLM utilizes windowed mean-pooling and focal loss with class-rebalancing weights.
In-Domain Results: On reasoning-heavy tasks like ARC (logic) and DART (math), Dr.LLM improves accuracy by up to +3.4 percentage points while saving an average of 5 layers of computation per example.
Out-of-Domain Robustness: The trained routers generalize well to out-of-domain tasks (such as MMLU, GSM8k, and TruthfulQA) with only a minimal 0.85 percentage point drop in accuracy while retaining their computational efficiency.

Overall, Dr.LLM successfully demonstrates that explicitly supervised routing can retrofit frozen LLMs to achieve budget-aware, accuracy-driven inference.

...more

Share EP115: Dr.LLM brings dynamic depth to AI

Sign up to save your podcasts

EP115: Dr.LLM brings dynamic depth to AI

EP115: Dr.LLM brings dynamic depth to AI