Daily Tech Feed: From the Labs

Stop Thinking So Hard


Listen Later

Stop Thinking So Hard

Large reasoning models have an overthinking problem. They reach the correct answer early in their chain of thought — then keep generating thousands of additional tokens reconsidering, double-checking, and exploring alternatives they'll ultimately discard. A new paper from researchers at UT Austin, EPFL, ENS Paris-Saclay, and Telecom Paris introduces TERMINATOR, an inference-time early-exit strategy that detects when a model has already generated its final answer and stops reasoning immediately.

The key insight is that the first arrival of a model's final answer in its chain of thought is detectable from hidden states. Token confidence spikes distinctly at the answer position. Thinking-word usage shifts — words like "hmm" and "okay" cluster before the answer; words like "another" and "alternatively" cluster after. These signals are real, consistent across math, coding, and science domains, and learnable by a small classifier.

TERMINATOR is a single transformer layer — initialized from the base model's final layer — with a binary prediction head trained to predict answer arrival at every token position. At inference time, a sliding window of the ten most recent predictions triggers a stop when majority vote says the answer is already there, injecting a close-thinking token into the token stream. No data-calibrated thresholds. No test-time distribution samples. Train once, deploy anywhere.

Results

Tested on Qwen3-8B, Qwen3-14B, Ministral-3-8B-Reasoning, and Ministral-3-14B-Reasoning across MATH-500, AIME 2025, HumanEval, and GPQA:

  • Best or second-best on 28 out of 32 metrics (accuracy + compression rate)
  • MATH-500: ~45% token reduction, accuracy drop under 0.5 percentage points
  • AIME 2025: ~30% reduction; TERMINATOR exits too early on hard problems — documented failure mode
  • Consistently occupies the best accuracy-efficiency Pareto frontier position versus DEER, Dynasor, Thought Calibration, and NoThinking
  • Links
    • Paper: arXiv:2603.12529 — TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning
    • Authors: Alliot Nagle (UT Austin), Jakhongir Saydaliev (EPFL), Dhia Garbaya (EPFL / ENS Paris-Saclay), Michael Gastpar (EPFL), Ashok Vardhan Makkuva (Telecom Paris / IP Paris), Hyeji Kim (UT Austin)
    • Related Work Mentioned
      • DEER — chunk-based early exit via token probability thresholds
      • Dynasor — periodic intermediate answer consistency checks
      • Thought Calibration — linear probes on reasoning step hidden states
      • Self-Certainty / Kang et al. — KL divergence confidence metric for reasoning
      • DeepSeek-R1 — large reasoning model showing overthinking phenomenon
      • Qwen3 — base models used in experiments
      • vLLM — inference framework used for dataset curation
      • Datasets
        • MATH — Lightman et al., mathematical problem solving
        • AIME 2025 — American Invitational Mathematics Examination
        • HumanEval — Chen et al., Python code generation
        • GPQA — Rein et al., graduate-level science questions
        • OpenScience — NVIDIA, scientific research dataset
        • OpenCoder-SFT — Huang et al., code instruction fine-tuning
        • DTF:FTL is produced by PDX Hackerspace Foundation. Find us on Apple Podcasts, Spotify, or wherever fine podcasts are distributed.

          ...more
          View all episodesView all episodes
          Download on the App Store

          Daily Tech Feed: From the LabsBy Daily Tech Feed