January 13, 2022

Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup

17 minutes

This paper introduces a gradient caching technique that decouples backpropagation between contrastive loss and the encoder, removing encoder backward pass data dependency along the batch dimension. As a result, gradients can be computed for one subset of the batch at a time, leading to almost constant memory usage.

2021: Luyu Gao, Yunyi Zhang, Jiawei Han, Jamie Callan

https://arxiv.org/pdf/2101.06983v2.pdf

...more