Machine Learning Street Talk (MLST)

Facebook Research - Unsupervised Translation of Programming Languages


Listen Later

In this episode of Machine Learning Street Talk Dr. Tim Scarfe, Yannic Kilcher and Connor Shorten spoke with Marie-Anne Lachaux, Baptiste Roziere and Dr. Guillaume Lample from Facebook Research (FAIR) in Paris. They recently released the paper "Unsupervised Translation of Programming Languages" which was an exciting new approach to learned translation of programming languages (learned transcoder) using an unsupervised encoder trained on individual monolingual corpora i.e. no parallel language data needed. The trick they used what that there is significant token overlap when using word-piece embeddings. It was incredible to talk with this talented group of researchers and I hope you enjoy the conversation too. 

Yannic's video on this got watched over 120K times! Check it out too https://www.youtube.com/watch?v=xTzFJIknh7E

Paper https://arxiv.org/abs/2006.03511; 

Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample

Abstract;

"A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin."

...more
View all episodesView all episodes
Download on the App Store

Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

90 ratings


More shows like Machine Learning Street Talk (MLST)

View all
Data Skeptic by Kyle Polich

Data Skeptic

479 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,094 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

301 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

344 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

227 Listeners

Practical AI by Practical AI LLC

Practical AI

205 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

95 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

208 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

518 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

500 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

AI + a16z by a16z

AI + a16z

35 Listeners

Training Data by Sequoia Capital

Training Data

42 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

134 Listeners