October 10, 2024

Ep12. Were RNNs All We Needed

8 minutes

This research paper argues that traditional recurrent neural networks (RNNs) like LSTMs and GRUs, despite being outperformed by Transformers for several years, can still be effective in long-sequence modeling. The authors demonstrate that by simplifying these older RNN architectures and eliminating their dependencies on previous hidden states, they can be trained in parallel using the parallel prefix scan algorithm. This leads to significant efficiency gains in training time and memory usage while achieving performance comparable to modern, more complex models like Mamba and Transformers. The paper presents these simplified versions of LSTMs and GRUs, called minLSTMs and minGRUs, and showcases their effectiveness in various tasks, including language modeling, reinforcement learning, and the Selective Copying task.

...more

View all episodes

By The Daily ML

October 10, 2024

Ep12. Were RNNs All We Needed

8 minutes

...more

Share Ep12. Were RNNs All We Needed

Sign up to save your podcasts

Ep12. Were RNNs All We Needed

Ep12. Were RNNs All We Needed