Share Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

Copy link

January 23, 2026

Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

17 minutes

This research paper addresses the challenge of anytime reasoning, where large language models (LLMs) must provide high-quality solutions under strict computational or token budgets. The authors introduce a novel evaluation metric called the Anytime Index, which measures how effectively a model’s solution quality improves as more reasoning tokens are generated. To enhance this efficiency, they propose Preference Data Prompting (PDP), an inference-time method where models learn from self-generated contrastive examples of successful and unsuccessful reasoning. Testing across diverse benchmarks like NaturalPlan, AIME, and GPQA shows that this technique consistently boosts both intermediate and final performance across various model families. Ultimately, the framework helps distinguish "fast-thinking" models that reach accuracy quickly from those that require exhaustive computation. This work proves that LLMs can become more resource-efficient by following guided, high-quality reasoning patterns without requiring human supervision or fine-tuning.

...more

View all episodes

By Enoch H. Kang

January 23, 2026

Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

17 minutes

...more

Sign up to save your podcasts