May 01, 2025

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

11 minutes

Paper: Sequence to Sequence Learning with Neural Networks — Ilya Sutskever, Oriol Vinyals & Quoc Le (2014)

The one-sentence summary: From ‘I ❤ Cats’ to ‘J’ ♥ les chats’ — how two LSTMs started talking to each other and taught the world machine translation.

What It’s About

Picture a relay race where Runner #1 takes a message in English, hands the baton to Runner #2, and—without tripping—Runner #2 sprints across the language barrier to deliver it in fault-free French. That, in spirit, is what Sutskever and friends pulled off in 2014: an encoder–decoder LSTM pipeline that transformed sequences into … well, other sequences. It was the first time a single neural network family tree could listen, remember, and speak—no hand-crafted phrase tables required.

Key Takeaways for Busy Humans

* End-to-End Everything — Say goodbye to hand-engineered pipelines; data in, translation out.

* Universal Interface — Any input/output that can be serialized (audio, code, protein sequences) is fair game.

* Foundation for Attention — The pain of squeezing long sentences into a single vector motivated Bahdanau-style attention one year later, and ultimately the Transformer.

* Encoder–Decoder as a Mindset — Prompts + completions, image captions, even humanoid-robot task planning all echo this two-brain pattern.

“Wolf Bites” — Skimmable Nuggets

* The model beat phrase-based SMT on the WMT’14 English→French benchmark with a BLEU of 34.8—legendary at the time.

* Training one epoch over 12M sentence pairs took ten days on eight NVIDIA K40 GPUs. Today you could replicate the experiment in an afternoon on a single RTX 4090.

* Google Translate quietly adopted seq2seq in late 2016, causing users worldwide to wonder if the product had been possessed by fluent spirits overnight.

Notes: The podcasts for this series are done with Google Notebook and the two podcasters you hear are AI-generated. The sources used to generate today’s “notebook” were: 1) the original paper and 2) this article.

Read the original paper here.

Sources

* Sutskever, I.; Vinyals, O.; Le, Q. V. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems 27 (2014).

* Google AI Blog. “A Neural Machine Translation System Per-Sentence BLEU Improvement” (2016).

* Kilcher, Y. “Seq2Seq Explained.” YouTube, 2020.

#Seq2Seq #MachineTranslation #DeepLearning #AIHistory #TheWolfReadsAI #deeplearningwiththewolf #dianawolftorres #deeplearning #sutskever #sequencetosequencelearning #ilyasutskever

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit dianawolftorres.substack.com

...more

View all episodes

By Diana Wolf Torres

May 01, 2025

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

11 minutes

Paper: Sequence to Sequence Learning with Neural Networks — Ilya Sutskever, Oriol Vinyals & Quoc Le (2014)

The one-sentence summary: From ‘I ❤ Cats’ to ‘J’ ♥ les chats’ — how two LSTMs started talking to each other and taught the world machine translation.

What It’s About

Key Takeaways for Busy Humans

* End-to-End Everything — Say goodbye to hand-engineered pipelines; data in, translation out.

* Universal Interface — Any input/output that can be serialized (audio, code, protein sequences) is fair game.

* Foundation for Attention — The pain of squeezing long sentences into a single vector motivated Bahdanau-style attention one year later, and ultimately the Transformer.

* Encoder–Decoder as a Mindset — Prompts + completions, image captions, even humanoid-robot task planning all echo this two-brain pattern.

“Wolf Bites” — Skimmable Nuggets

* The model beat phrase-based SMT on the WMT’14 English→French benchmark with a BLEU of 34.8—legendary at the time.

* Training one epoch over 12M sentence pairs took ten days on eight NVIDIA K40 GPUs. Today you could replicate the experiment in an afternoon on a single RTX 4090.

* Google Translate quietly adopted seq2seq in late 2016, causing users worldwide to wonder if the product had been possessed by fluent spirits overnight.

Read the original paper here.

Sources

* Sutskever, I.; Vinyals, O.; Le, Q. V. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems 27 (2014).

* Google AI Blog. “A Neural Machine Translation System Per-Sentence BLEU Improvement” (2016).

* Kilcher, Y. “Seq2Seq Explained.” YouTube, 2020.

#Seq2Seq #MachineTranslation #DeepLearning #AIHistory #TheWolfReadsAI #deeplearningwiththewolf #dianawolftorres #deeplearning #sutskever #sequencetosequencelearning #ilyasutskever

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit dianawolftorres.substack.com

...more

Share Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

Sign up to save your podcasts

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)

Day 8: "Sequence to Sequence Learning with Neural Networks." (When Two LSTMs Started Speaking in Tongues.)