
Sign up to save your podcasts
Or


We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:
July 2025:
Learning without training:
The implicit dynamics of in-context learning
https://arxiv.org/pdf/2507.16003
September 2025:
Federated Learning with Ad-hoc Adapter Insertions: The Case of
Soft-Embeddings for Training Classifier-as-Retriever
https://arxiv.org/pdf/2509.16508
The first source explores the concept of **In-Context Learning (ICL)** in neural networks, proposing that the effect of context on a token's output is equivalent to an **implicit weight update** in the neural network, specifically in the MLP layer, generalizing the transformer block using a **contextual block** notion. This work provides an explicit low-rank update formula for this implicit weight modification and mathematically demonstrates that token consumption aligns with an implicit **gradient descent learning dynamics** on the network weights. The second source introduces a novel **retrieval-augmented generation (RAG)** architecture called **Classifier-as-Retriever (CaR)** for memory-constrained edge devices, proposing to use a frozen Small Language Model (SLM) augmented with a small trainable **adapter network** to generate "soft embeddings" and a trainable **classifier head** instead of conventional similarity functions. Crucially, this architecture is designed for distributed training using **Federated Learning (FL)**, incorporating **Differential Privacy (DP)** techniques to ensure client-side data protection and demonstrating significant speedup advantages over centralized training.
By mcgrofWe compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:
July 2025:
Learning without training:
The implicit dynamics of in-context learning
https://arxiv.org/pdf/2507.16003
September 2025:
Federated Learning with Ad-hoc Adapter Insertions: The Case of
Soft-Embeddings for Training Classifier-as-Retriever
https://arxiv.org/pdf/2509.16508
The first source explores the concept of **In-Context Learning (ICL)** in neural networks, proposing that the effect of context on a token's output is equivalent to an **implicit weight update** in the neural network, specifically in the MLP layer, generalizing the transformer block using a **contextual block** notion. This work provides an explicit low-rank update formula for this implicit weight modification and mathematically demonstrates that token consumption aligns with an implicit **gradient descent learning dynamics** on the network weights. The second source introduces a novel **retrieval-augmented generation (RAG)** architecture called **Classifier-as-Retriever (CaR)** for memory-constrained edge devices, proposing to use a frozen Small Language Model (SLM) augmented with a small trainable **adapter network** to generate "soft embeddings" and a trainable **classifier head** instead of conventional similarity functions. Crucially, this architecture is designed for distributed training using **Federated Learning (FL)**, incorporating **Differential Privacy (DP)** techniques to ensure client-side data protection and demonstrating significant speedup advantages over centralized training.