November 19, 2025

MLP Mixer Models

13 minutes

These sources collectively explore the **MLP-Mixer architecture** and its numerous extensions across computer vision and audio tasks. The core concept of the Mixer is to separate and blend information—originally via **token-mixing** (spatial locations) and **channel-mixing** (features)—using only **Multi-Layer Perceptrons (MLPs)**, which is seen as a simpler alternative to CNNs and Vision Transformers. One source introduces **KAN-Mixers**, replacing standard MLPs with **Kolmogorov-Arnold Networks (KANs)** to potentially improve accuracy and interpretability for image classification, showing strong results on CIFAR-10. Other works propose structural modifications, such as the **Circulant Channel-Specific (CCS) token-mixing MLP** to improve spatial invariance and efficiency, and **ConvMixer**, which uses large-kernel convolutions for mixing. Furthermore, the Mixer principle is applied to audio classification with **ASM-RH**, which blends **Roll-Time** and **Hermit-Frequency** information, proving the **Mixer is a versatile paradigm** adaptable to domain-specific feature perspectives. Finally, research also suggests that the **success of the MLP-Mixer** is rooted in its effective structure as a **wide and sparse MLP**, which embeds sparsity as an inductive bias.

Sources:

1. KAN-Mixers: a new deep learning architecture for image classification (Excerpts)

https://arxiv.org/html/2503.08939v1

2. MLP-Mixer: An all-MLP Architecture for Vision | https://arxiv.org/pdf/2105.01601

3. ResMLP: Feedforward networks for image classification with data-efficient training | https://arxiv.org/pdf/2105.03404

4. Pay Attention to MLPs (gMLP) | https://arxiv.org/pdf/2105.08050

5. Rethinking Token-Mixing MLP for MLP-based Vision Backbone (CCS Token-Mixing MLP) | https://arxiv.org/pdf/2106.14882

6. Patches Are All You Need? (ConvMixer) | https://arxiv.org/pdf/2201.09792

7. Understanding MLP-Mixer as a Wide and Sparse MLP | https://arxiv.org/pdf/2306.01470

8. Strip-MLP: Efficient Token Interaction for Vision MLP | https://arxiv.org/pdf/2307.11458

9. Mixer is more than just a model (ASM-RH) | https://arxiv.org/pdf/2402.18007

10. DynaMixer: A Vision MLP Architecture with Dynamic Mixing | https://proceedings.mlr.press/v162/wang22i/wang22i.pdf

...more

View all episodes

By mcgrof