October 08, 2025

Contextual Blocks: Implicit Weight Updates and Federated Learning

13 minutes

We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast:

July 2025:

Learning without training:

The implicit dynamics of in-context learning

https://arxiv.org/pdf/2507.16003

September 2025:

Federated Learning with Ad-hoc Adapter Insertions: The Case of

Soft-Embeddings for Training Classifier-as-Retriever

https://arxiv.org/pdf/2509.16508

The first source explores the concept of **In-Context Learning (ICL)** in neural networks, proposing that the effect of context on a token's output is equivalent to an **implicit weight update** in the neural network, specifically in the MLP layer, generalizing the transformer block using a **contextual block** notion. This work provides an explicit low-rank update formula for this implicit weight modification and mathematically demonstrates that token consumption aligns with an implicit **gradient descent learning dynamics** on the network weights. The second source introduces a novel **retrieval-augmented generation (RAG)** architecture called **Classifier-as-Retriever (CaR)** for memory-constrained edge devices, proposing to use a frozen Small Language Model (SLM) augmented with a small trainable **adapter network** to generate "soft embeddings" and a trainable **classifier head** instead of conventional similarity functions. Crucially, this architecture is designed for distributed training using **Federated Learning (FL)**, incorporating **Differential Privacy (DP)** techniques to ensure client-side data protection and demonstrating significant speedup advantages over centralized training.

...more