Papers Read on AI

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition


Listen Later

Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. Along with the growth of computational capacity, we now have open-source vision-language pre-trained models in large scales of the model architecture and amount of data. In this study, we focus on transferring knowledge for video classification tasks. Conventional methods randomly initialize the linear classifier head for vision classification, but they leave the usage of the text encoder for downstream visual recognition tasks undiscovered. In this paper, we revise the role of the linear classifier and replace the classifier with different knowledge from the pre-trained model.
2022: Wenhao Wu, Zhun Sun, Wanli Ouyang
Ranked #1 on Action Recognition on ActivityNet
https://arxiv.org/pdf/2207.01297v3.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings