Papers Read on AI

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup


Listen Later

This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.
2021: Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan
https://arxiv.org/pdf/2101.06983v2.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings