AI Post Transformers

Contextual Blocks: Implicit Weight Updates and Federated Learning


Listen Later

We compare and contrast the math behind two recent research papers which we have covered individually before on this podcast: July 2025: Learning without training: The implicit dynamics of in-context learning https://arxiv.org/pdf/2507.16003 September 2025: Federated Learning with Ad-hoc Adapter Insertions: The Case of Soft-Embeddings for Training Classifier-as-Retriever https://arxiv.org/pdf/2509.16508 The first source explores the concept of In-Context Learning (ICL) in neural networks, proposing that the effect of context on a token's output is equivalent to an implicit weight update in the neural network, specifically in the MLP layer, generalizing the transformer block using a contextual block notion. This work provides an explicit low-rank update formula for this implicit weight modification and mathematically demonstrates that token consumption aligns with an implicit gradient descent learning dynamics on the network weights. The second source introduces a novel retrieval-augmented generation (RAG) architecture called Classifier-as-Retriever (CaR) for memory-constrained edge devices, proposing to use a frozen Small Language Model (SLM) augmented with a small trainable adapter network to generate "soft embeddings" and a trainable classifier head instead of conventional similarity functions. Crucially, this architecture is designed for distributed training using Federated Learning (FL), incorporating Differential Privacy (DP) techniques to ensure client-side data protection and demonstrating significant speedup advantages over centralized training.
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof