March 30, 2026

Doc-to-LoRA: Internalizing Context as LoRA

This episode explores Doc-to-LoRA, a method for turning an entire document into a lightweight LoRA adapter so a language model can answer later questions without repeatedly rereading the source text. It explains how the paper combines context distillation, LoRA fine-tuning, and a Perceiver-style hypernetwork that ingests variable-length documents and emits fixed-size parameter updates, using chunking to handle longer inputs. The discussion highlights reported results such as near-perfect zero-shot performance on synthetic long-context retrieval beyond 32K tokens and improved efficiency on long-document question answering through lower update latency, lower peak memory use, and reduced KV-cache costs at inference time. It also digs into the systems argument behind the work, framing reusable internalized memory as a different primitive from prompting, while questioning how well the approach holds up outside limited-query evaluations and whether its benefits persist against alternatives like prompt compression or keeping context externally.

Sources:

1. Doc-to-LoRA: Internalizing Context as LoRA

https://arxiv.org/pdf/2602.15902

2. 2603.13875

https://arxiv.org/abs/2603.13875

3. 2510.03215

https://arxiv.org/abs/2510.03215

4. LoRA: Low-Rank Adaptation of Large Language Models — Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, 2022

https://scholar.google.com/scholar?q=LoRA:+Low-Rank+Adaptation+of+Large+Language+Models

5. QLoRA: Efficient Finetuning of Quantized LLMs — Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023

https://scholar.google.com/scholar?q=QLoRA:+Efficient+Finetuning+of+Quantized+LLMs

6. AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning — Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao, 2023

https://scholar.google.com/scholar?q=AdaLoRA:+Adaptive+Budget+Allocation+for+Parameter-Efficient+Fine-Tuning

7. DoRA: Weight-Decomposed Low-Rank Adaptation — Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen, 2024

https://scholar.google.com/scholar?q=DoRA:+Weight-Decomposed+Low-Rank+Adaptation

8. HyperNetworks — David Ha, Andrew Dai, Quoc V. Le, 2016

https://scholar.google.com/scholar?q=HyperNetworks

9. Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks — Rabeeh Karimi Mahabadi, Sebastian Ruder, Mostafa Dehghani, James Henderson, 2021

https://scholar.google.com/scholar?q=Parameter-efficient+Multi-task+Fine-tuning+for+Transformers+via+Shared+Hypernetworks

10. HyperPrompt: Prompt-based Task-Conditioning of Transformers — Yun He, Huaixiu Steven Zheng, Yi Tay, Jai Gupta, Yu Du, Vamsi Aribandi, Zhe Zhao, Yaguang Li, Zhao Chen, Donald Metzler, Heng-Tze Cheng, Ed H. Chi, 2022

https://scholar.google.com/scholar?q=HyperPrompt:+Prompt-based+Task-Conditioning+of+Transformers

11. Doc-to-LoRA: Learning to Instantly Internalize Contexts — Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, Robert Tjarko Lange, 2026

https://scholar.google.com/scholar?q=Doc-to-LoRA:+Learning+to+Instantly+Internalize+Contexts

12. Text-to-LoRA: Instant Transformer Adaption — Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, Robert Tjarko Lange, 2025

https://scholar.google.com/scholar?q=Text-to-LoRA:+Instant+Transformer+Adaption

13. Generative Adapter: Contextualizing Language Models in Parameters with a Single Forward Pass — Tianyu Chen, Huanran Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng, 2025

https://scholar.google.com/scholar?q=Generative+Adapter:+Contextualizing+Language+Models+in+Parameters+with+a+Single+Forward+Pass

14. Cartridges: Lightweight and General-Purpose Long Context Representations via Self-Study — Sabri Eyuboglu, Ryan Saul Ehrlich, Simran Arora, Neel Guha, Dylan Zinsley, Emily Ruoyu Liu, William Tennien, Atri Rudra, James Zou, Azalia Mirhoseini, Christopher Re, 2025

https://scholar.google.com/scholar?q=Cartridges:+Lightweight+and+General-Purpose+Long+Context+Representations+via+Self-Study

15. Propagating Knowledge Updates to LMs through Distillation — Suchin Padmanabhan, Yoon Kim Onoe, Michael Zhang, Greg Durrett, Eunsol Choi, 2023

https://scholar.google.com/scholar?q=Propagating+Knowledge+Updates+to+LMs+through+Distillation

16. LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression — Zefan Pan, Qipeng Wu, Hao Jiang, Mengzhou Xia, Xuefei Luo, Jiaqi Zhang, Qingyu Lin, Viktor Ruhle, Yi Yang, Chin-Yew Lin, H. Vicky Zhao, Lidong Qiu, Dongmei Zhang, 2024

https://scholar.google.com/scholar?q=LLMLingua-2:+Data+Distillation+for+Efficient+and+Faithful+Task-Agnostic+Prompt+Compression

17. RazorAttention: Efficient KV Cache Compression Through Retrieval Heads — Hanlin Tang et al., 2024

https://scholar.google.com/scholar?q=RazorAttention:+Efficient+KV+Cache+Compression+Through+Retrieval+Heads

18. Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning — Yu Fu et al., 2024/2025

https://scholar.google.com/scholar?q=Not+All+Heads+Matter:+A+Head-Level+KV+Cache+Compression+Method+with+Integrated+Retrieval+and+Reasoning

19. How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? — Sergey Pletenev et al., 2025

https://scholar.google.com/scholar?q=How+Much+Knowledge+Can+You+Pack+into+a+LoRA+Adapter+without+Harming+LLM?

20. Can Fine-Tuning Erase Your Edits? On the Fragile Coexistence of Knowledge Editing and Adaptation — Yinjie Cheng et al., 2025

https://scholar.google.com/scholar?q=Can+Fine-Tuning+Erase+Your+Edits?+On+the+Fragile+Coexistence+of+Knowledge+Editing+and+Adaptation

21. Memorization in In-Context Learning — Shahriar Golchin et al., 2024

https://scholar.google.com/scholar?q=Memorization+in+In-Context+Learning

22. In-Context Learning can Perform Continual Learning Like Humans — Liuwang Kang et al., 2025

https://scholar.google.com/scholar?q=In-Context+Learning+can+Perform+Continual+Learning+Like+Humans

23. AI Post Transformers: LoRA: Low-Rank Adaptation of Large Language Models — Hal Turing & Dr. Ada Shannon, Fri,

https://podcast.do-not-panic.com/episodes/lora-low-rank-adaptation-of-large-language-models/

24. AI Post Transformers: ShadowKV: High-Throughput Long-Context LLM Inference — Hal Turing & Dr. Ada Shannon, Wed,

https://podcast.do-not-panic.com/episodes/shadowkv-high-throughput-long-context-llm-inference/

25. AI Post Transformers: Lookahead Q-Cache for Consistent KV Eviction — Hal Turing & Dr. Ada Shannon, 2026

https://podcast.do-not-panic.com/episodes/2026-03-25-lookahead-q-cache-for-consistent-kv-evic-d97b09.mp3

26. AI Post Transformers: Mem0: Scalable Long-Term Memory for AI Agents — Hal Turing & Dr. Ada Shannon, Tue,

https://podcast.do-not-panic.com/episodes/mem0-scalable-long-term-memory-for-ai-agents/

27. AI Post Transformers: Kimi Linear: Efficient Expressive Attention Architecture — Hal Turing & Dr. Ada Shannon, Sun,

https://podcast.do-not-panic.com/episodes/kimi-linear-efficient-expressive-attention-architecture/

28. AI Post Transformers: ComoRAG: Cognitively Inspired Narrative Reasoning — Hal Turing & Dr. Ada Shannon, Tue,

https://podcast.do-not-panic.com/episodes/comorag-cognitively-inspired-narrative-reasoning/

Interactive Visualization: Doc-to-LoRA: Internalizing Context as LoRA

...more

View all episodes

By mcgrof