O'Reilly Data Show Podcast

Why AI and machine learning researchers are beginning to embrace PyTorch


Listen Later

In this episode of the Data Show, I spoke with Soumith Chintala, AI research engineer at Facebook. Among his many research projects, Chintala was part of the team behind DCGAN (Deep Convolutional Generative Adversarial Networks), a widely cited paper that introduced a set of neural network architectures for unsupervised learning. Our conversation centered around PyTorch, the successor to the popular Torch scientific computing framework. PyTorch is a relatively new deep learning framework that is fast becoming popular among researchers. Like Chainer, PyTorch supports dynamic computation graphs, a feature that makes it attractive to researchers and engineers who work with text and time-series.
Here are some highlights from our conversation:
The origins of PyTorch
TensorFlow addressed one part of the problem, which is quality control and packaging. It offered a Theano style programming model, so it was a very low-level deep learning framework. … There are a multitude of front ends that are trying to cope with the fact that TensorFlow is a very low-level framework—there’s TF-slim, there’s Keras. I think there’s like 10 or 15, and just from Google there’s probably like four or five of those.
On the Torch side, the philosophy has always been slightly different than Theano. I see TensorFlow as a much better Theano-style framework, and on the Torch side we had a philosophy that we want to be imperative, which means that you run your computation immediately. Debugging should be butter smooth. The user should never have trouble debugging their programs, whether they use a Python debugger or something like the GDB or something else.
… Chainer was a huge inspiration. PyTorch is inspired primarily by three frameworks. Within the Torch community, certain researchers from Twitter built an auxiliary package called Autograd, and this was actually based on a package called Autograd in the Python community. Like Chainer, Autograd and Torch Autograd, all used a certain technique called tape-based automatic differentiation: that is, you have a tape recorder that records what operations you have performed and then it replays it backward to compute your gradients. This is a technique that is not used by any of the other major frameworks except PyTorch and Chainer. All of the other frameworks use what we call a static graph—that is, the user builds a graph, then they give that graph to an execution engine that is provided by the framework, and the framework executes it. It can analyze it ahead of time.
These are very two different techniques. The tape-based differentiation gives you easier debuggability, and it gives you certain things that are more powerful (e.g., dynamic neural networks). The static graph-based approach gives you easier deployment to mobile, easier deployment to more exotic architectures, the ability to do compiler techniques ahead of time, and so on.
Deep learning frameworks within Facebook
Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production. This makes it easier for us to separate out which team does what and which tools do what. What we are seeing is, users first create a PyTorch model. When they are ready to deploy their model into production, they just convert it into a Caffe 2 model, then ship into either mobile or another platform.
PyTorch user profiles
PyTorch has gotten its biggest adoption from researchers, and it’s gotten about a moderate response from data scientists. As we expected, we did not get any adoption from product builders because PyTorch models are not easy to ship into mobile, for example. We also have people who we did not expect to come on board, like folks from OpenAI and several universities.
Related resources:
Building intelligent applications with deep learning and TensorFlow
BigDL: Deep learning for Apache Spark
MXNet: Deep learning that’s easy to implement and easy to
...more
View all episodesView all episodes
Download on the App Store

O'Reilly Data Show PodcastBy O'Reilly Media

  • 4
  • 4
  • 4
  • 4
  • 4

4

63 ratings


More shows like O'Reilly Data Show Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

285 Listeners

O'Reilly Radar Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Radar Podcast - O'Reilly Media Podcast

35 Listeners

Data Skeptic by Kyle Polich

Data Skeptic

475 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

580 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

O'Reilly Design Podcast - O'Reilly Media Podcast by O'Reilly Media

O'Reilly Design Podcast - O'Reilly Media Podcast

8 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

214 Listeners

Data Engineering Podcast by Tobias Macey

Data Engineering Podcast

139 Listeners

DataFramed by DataCamp

DataFramed

266 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

188 Listeners

Me, Myself, and AI by MIT Sloan Management Review and Boston Consulting Group (BCG)

Me, Myself, and AI

99 Listeners

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

139 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

178 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

397 Listeners