Share Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

Copy link

December 21, 2025

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

13 minutes

Large language models often struggle with long-context tasks because the attention mechanism suffers from **score dilution**, where relevant information is overwhelmed by surrounding "distractor" tokens. Researchers found that common **inference-time scaling strategies**, such as generating additional "thinking tokens," fail to solve this problem as context length increases. To address this, the authors propose **query-only test-time training (qTTT)**, a computationally efficient method that updates only the model's **query projection matrices** for a specific input. By performing a single prefill to cache **keys and values** and then applying targeted gradient updates, the model learns to better distinguish the "needle" of relevant information from the "haystack" of noise. Experiments across **LongBench-v2** and **ZeroScrolls** benchmarks show that qTTT consistently outperforms traditional methods and thinking tokens. This approach suggests that **adapting model parameters** during inference is a more effective use of compute than simply increasing the length of the generated output.

...more

View all episodes

By Enoch H. Kang

December 21, 2025

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

13 minutes

...more

Sign up to save your podcasts