December 19, 2022

What do Vision Transformers Learn? A Visual Exploration

28 minutes

Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs.

2022: Amin Ghiasi, Hamid Kazemi, Eitan Borgnia, Steven Reich, Manli Shu, Micah Goldblum, A. Wilson, T. Goldstein

https://arxiv.org/pdf/2212.06727v1.pdf

...more