March 22, 2026

Stop Thinking So Hard

21 minutes

Large reasoning models have an overthinking problem. They reach the correct answer early in their chain of thought — then keep generating thousands of additional tokens reconsidering, double-checking, and exploring alternatives they'll ultimately discard. A new paper from researchers at UT Austin, EPFL, ENS Paris-Saclay, and Telecom Paris introduces TERMINATOR, an inference-time early-exit strategy that detects when a model has already generated its final answer and stops reasoning immediately.

The key insight is that the first arrival of a model's final answer in its chain of thought is detectable from hidden states. Token confidence spikes distinctly at the answer position. Thinking-word usage shifts — words like "hmm" and "okay" cluster before the answer; words like "another" and "alternatively" cluster after. These signals are real, consistent across math, coding, and science domains, and learnable by a small classifier.

TERMINATOR is a single transformer layer — initialized from the base model's final layer — with a binary prediction head trained to predict answer arrival at every token position. At inference time, a sliding window of the ten most recent predictions triggers a stop when majority vote says the answer is already there, injecting a close-thinking token into the token stream. No data-calibrated thresholds. No test-time distribution samples. Train once, deploy anywhere.

Results

Tested on Qwen3-8B, Qwen3-14B, Ministral-3-8B-Reasoning, and Ministral-3-14B-Reasoning across MATH-500, AIME 2025, HumanEval, and GPQA:

Best or second-best on 28 out of 32 metrics (accuracy + compression rate)

MATH-500: ~45% token reduction, accuracy drop under 0.5 percentage points

AIME 2025: ~30% reduction; TERMINATOR exits too early on hard problems — documented failure mode

Consistently occupies the best accuracy-efficiency Pareto frontier position versus DEER, Dynasor, Thought Calibration, and NoThinking

Links

Paper: arXiv:2603.12529 — TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

Authors: Alliot Nagle (UT Austin), Jakhongir Saydaliev (EPFL), Dhia Garbaya (EPFL / ENS Paris-Saclay), Michael Gastpar (EPFL), Ashok Vardhan Makkuva (Telecom Paris / IP Paris), Hyeji Kim (UT Austin)

Related Work Mentioned

DEER — chunk-based early exit via token probability thresholds

Dynasor — periodic intermediate answer consistency checks

Thought Calibration — linear probes on reasoning step hidden states

Self-Certainty / Kang et al. — KL divergence confidence metric for reasoning

DeepSeek-R1 — large reasoning model showing overthinking phenomenon

Qwen3 — base models used in experiments

vLLM — inference framework used for dataset curation

Datasets

MATH — Lightman et al., mathematical problem solving

AIME 2025 — American Invitational Mathematics Examination

HumanEval — Chen et al., Python code generation

GPQA — Rein et al., graduate-level science questions

OpenScience — NVIDIA, scientific research dataset

OpenCoder-SFT — Huang et al., code instruction fine-tuning

DTF:FTL is produced by PDX Hackerspace Foundation. Find us on Apple Podcasts, Spotify, or wherever fine podcasts are distributed.

...more

View all episodes

By Daily Tech Feed

March 22, 2026

Stop Thinking So Hard

21 minutes

Stop Thinking So Hard

Results

Tested on Qwen3-8B, Qwen3-14B, Ministral-3-8B-Reasoning, and Ministral-3-14B-Reasoning across MATH-500, AIME 2025, HumanEval, and GPQA:

Best or second-best on 28 out of 32 metrics (accuracy + compression rate)

MATH-500: ~45% token reduction, accuracy drop under 0.5 percentage points

AIME 2025: ~30% reduction; TERMINATOR exits too early on hard problems — documented failure mode

Consistently occupies the best accuracy-efficiency Pareto frontier position versus DEER, Dynasor, Thought Calibration, and NoThinking

Links

Paper: arXiv:2603.12529 — TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

Related Work Mentioned

DEER — chunk-based early exit via token probability thresholds

Dynasor — periodic intermediate answer consistency checks

Thought Calibration — linear probes on reasoning step hidden states

Self-Certainty / Kang et al. — KL divergence confidence metric for reasoning

DeepSeek-R1 — large reasoning model showing overthinking phenomenon

Qwen3 — base models used in experiments

vLLM — inference framework used for dataset curation

Datasets

MATH — Lightman et al., mathematical problem solving

AIME 2025 — American Invitational Mathematics Examination

HumanEval — Chen et al., Python code generation

GPQA — Rein et al., graduate-level science questions

OpenScience — NVIDIA, scientific research dataset

OpenCoder-SFT — Huang et al., code instruction fine-tuning

DTF:FTL is produced by PDX Hackerspace Foundation. Find us on Apple Podcasts, Spotify, or wherever fine podcasts are distributed.

...more

Share Stop Thinking So Hard

Sign up to save your podcasts

Stop Thinking So Hard

Stop Thinking So Hard