The Daily ML

Ep12. Were RNNs All We Needed


Listen Later

This research paper argues that traditional recurrent neural networks (RNNs) like LSTMs and GRUs, despite being outperformed by Transformers for several years, can still be effective in long-sequence modeling. The authors demonstrate that by simplifying these older RNN architectures and eliminating their dependencies on previous hidden states, they can be trained in parallel using the parallel prefix scan algorithm. This leads to significant efficiency gains in training time and memory usage while achieving performance comparable to modern, more complex models like Mamba and Transformers. The paper presents these simplified versions of LSTMs and GRUs, called minLSTMs and minGRUs, and showcases their effectiveness in various tasks, including language modeling, reinforcement learning, and the Selective Copying task.
...more
View all episodesView all episodes
Download on the App Store

The Daily MLBy The Daily ML