PyTorch Developer Podcast

DataLoader with multiple workers leaks memory


Listen Later

Today I'm going to talk about a famous issue in PyTorch, DataLoader with num_workers > 0 causes memory leak (https://github.com/pytorch/pytorch/issues/13246). This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.

Further reading.

  • A nice summary of the full issue https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662
  • DataLoader architecture RFC https://github.com/pytorch/pytorch/issues/49440
  • Cinder Python https://github.com/facebookincubator/cinder
...more
View all episodesView all episodes
Download on the App Store

PyTorch Developer PodcastBy Edward Yang, Team PyTorch

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

47 ratings


More shows like PyTorch Developer Podcast

View all
Talk Python To Me by Michael Kennedy

Talk Python To Me

592 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

213 Listeners

The Product Podcast by Product School

The Product Podcast

167 Listeners

NerdWallet's Smart Money Podcast by NerdWallet Personal Finance

NerdWallet's Smart Money Podcast

750 Listeners

Darknet Diaries by Jack Rhysider

Darknet Diaries

7,864 Listeners

Naval by Naval

Naval

2,096 Listeners