
Sign up to save your podcasts
Or


Mixture-of-Recursions (MoR) is a unified framework built on a Recursive Transformer architecture, designed to enhance the efficiency of large language models. It achieves this by combining three core paradigms: parameter sharing (reusing shared layers across recursion steps), adaptive computation (dynamically assigning different processing depths to individual tokens via lightweight routers), and efficient Key-Value (KV) caching (selectively storing or sharing KV pairs). This integrated approach enables MoR to deliver large-model quality with significantly reduced computational and memory overhead, improving efficiency for both training and inference.
By AI-Talk4
44 ratings
Mixture-of-Recursions (MoR) is a unified framework built on a Recursive Transformer architecture, designed to enhance the efficiency of large language models. It achieves this by combining three core paradigms: parameter sharing (reusing shared layers across recursion steps), adaptive computation (dynamically assigning different processing depths to individual tokens via lightweight routers), and efficient Key-Value (KV) caching (selectively storing or sharing KV pairs). This integrated approach enables MoR to deliver large-model quality with significantly reduced computational and memory overhead, improving efficiency for both training and inference.

303 Listeners

341 Listeners

112,539 Listeners

266 Listeners

111 Listeners

3 Listeners