Best AI papers explained

Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs


Listen Later

Large language models often struggle with long-context tasks because the attention mechanism suffers from **score dilution**, where relevant information is overwhelmed by surrounding "distractor" tokens. Researchers found that common **inference-time scaling strategies**, such as generating additional "thinking tokens," fail to solve this problem as context length increases. To address this, the authors propose **query-only test-time training (qTTT)**, a computationally efficient method that updates only the model's **query projection matrices** for a specific input. By performing a single prefill to cache **keys and values** and then applying targeted gradient updates, the model learns to better distinguish the "needle" of relevant information from the "haystack" of noise. Experiments across **LongBench-v2** and **ZeroScrolls** benchmarks show that qTTT consistently outperforms traditional methods and thinking tokens. This approach suggests that **adapting model parameters** during inference is a more effective use of compute than simply increasing the length of the generated output.

...more
View all episodesView all episodes
Download on the App Store

Best AI papers explainedBy Enoch H. Kang