Papers Read on AI

A ConvNet for the 2020s


Listen Later

The “Roaring 20s” of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks.
2022: Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie
Ranked #1 on Domain Generalization on ImageNet-Sketch (using extra training data)
https://arxiv.org/pdf/2201.03545v1.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings