
Sign up to save your podcasts
Or


This research paper addresses the challenge of anytime reasoning, where large language models (LLMs) must provide high-quality solutions under strict computational or token budgets. The authors introduce a novel evaluation metric called the Anytime Index, which measures how effectively a model’s solution quality improves as more reasoning tokens are generated. To enhance this efficiency, they propose Preference Data Prompting (PDP), an inference-time method where models learn from self-generated contrastive examples of successful and unsuccessful reasoning. Testing across diverse benchmarks like NaturalPlan, AIME, and GPQA shows that this technique consistently boosts both intermediate and final performance across various model families. Ultimately, the framework helps distinguish "fast-thinking" models that reach accuracy quickly from those that require exhaustive computation. This work proves that LLMs can become more resource-efficient by following guided, high-quality reasoning patterns without requiring human supervision or fine-tuning.
By Enoch H. KangThis research paper addresses the challenge of anytime reasoning, where large language models (LLMs) must provide high-quality solutions under strict computational or token budgets. The authors introduce a novel evaluation metric called the Anytime Index, which measures how effectively a model’s solution quality improves as more reasoning tokens are generated. To enhance this efficiency, they propose Preference Data Prompting (PDP), an inference-time method where models learn from self-generated contrastive examples of successful and unsuccessful reasoning. Testing across diverse benchmarks like NaturalPlan, AIME, and GPQA shows that this technique consistently boosts both intermediate and final performance across various model families. Ultimately, the framework helps distinguish "fast-thinking" models that reach accuracy quickly from those that require exhaustive computation. This work proves that LLMs can become more resource-efficient by following guided, high-quality reasoning patterns without requiring human supervision or fine-tuning.