In this episode:
• The Context Window Illusion: Norris and Linda introduce the episode and the paper, discussing why million-token context windows don't automatically solve reasoning tasks.
• The Math of Score Dilution: Linda dives into the theoretical bottleneck of static self-attention, explaining why the target-distractor margin must scale logarithmically.
• Query-Only Test-Time Training: Linda reveals the paper's solution: updating only the query projection matrices at inference time to avoid invalidating the KV cache.
• Compute Equivalency: qTTT vs Thinking Tokens: Norris challenges the computational cost, leading to a discussion on how qTTT strictly matches the FLOPs of chain-of-thought decoding.
• Results and Takeaways: The hosts discuss the empirical results on LongBench-v2 and ZeroScrolls, concluding with the implications for inference-time compute scaling.