Mechanical Dreams

Let's (not) just put things in Context- Test-Time Training for Long-Context LLMs


Listen Later

In this episode:
• The Context Window Illusion: Norris and Linda introduce the episode and the paper, discussing why million-token context windows don't automatically solve reasoning tasks.
• The Math of Score Dilution: Linda dives into the theoretical bottleneck of static self-attention, explaining why the target-distractor margin must scale logarithmically.
• Query-Only Test-Time Training: Linda reveals the paper's solution: updating only the query projection matrices at inference time to avoid invalidating the KV cache.
• Compute Equivalency: qTTT vs Thinking Tokens: Norris challenges the computational cost, leading to a discussion on how qTTT strictly matches the FLOPs of chain-of-thought decoding.
• Results and Takeaways: The hosts discuss the empirical results on LongBench-v2 and ZeroScrolls, concluding with the implications for inference-time compute scaling.
...more
View all episodesView all episodes
Download on the App Store

Mechanical DreamsBy Mechanical Dirk