May 22, 2025

Sleep-time Compute: Beyond Inference Scaling at Test-time

12 minutes

This academic paper explores "sleep-time compute" for large language models (LLMs), a concept where models process information from a given context while idle, anticipating potential future queries. The authors introduce Stateful GSM-Symbolic and Stateful AIME, datasets created by splitting existing reasoning problems into context and questions to test this approach. Their experiments show that sleep-time compute significantly reduces the need for test-time compute to achieve similar accuracy, offering a more efficient inference process. Furthermore, by preparing for multiple related questions about the same context, sleep-time compute can lower the average cost per query. The paper concludes that sleep-time compute is most effective when queries are predictable from the provided context.

...more

View all episodes

By Enoch H. Kang

May 22, 2025

Sleep-time Compute: Beyond Inference Scaling at Test-time

12 minutes

...more

Share Sleep-time Compute: Beyond Inference Scaling at Test-time

Sign up to save your podcasts

Sleep-time Compute: Beyond Inference Scaling at Test-time

Sleep-time Compute: Beyond Inference Scaling at Test-time