
Sign up to save your podcasts
Or
This research paper revisits the traditional Recurrent Neural Networks (RNNs) – specifically, LSTMs and GRUs – and shows how to adapt them for modern parallel training. The authors demonstrate that by removing certain dependencies within the RNN structure, these models can be trained using the Parallel Scan algorithm, making them significantly faster than their traditional counterparts. The paper then compares the performance of these simplified LSTMs and GRUs (minLSTMs and minGRUs) to recent state-of-the-art sequence models in several tasks, including Selective Copying, Reinforcement Learning, and Language Modeling. The results show that the minLSTMs and minGRUs achieve comparable or better performance than other models while being far more efficient, suggesting that RNNs might be a viable option even in the era of Transformers.
This research paper revisits the traditional Recurrent Neural Networks (RNNs) – specifically, LSTMs and GRUs – and shows how to adapt them for modern parallel training. The authors demonstrate that by removing certain dependencies within the RNN structure, these models can be trained using the Parallel Scan algorithm, making them significantly faster than their traditional counterparts. The paper then compares the performance of these simplified LSTMs and GRUs (minLSTMs and minGRUs) to recent state-of-the-art sequence models in several tasks, including Selective Copying, Reinforcement Learning, and Language Modeling. The results show that the minLSTMs and minGRUs achieve comparable or better performance than other models while being far more efficient, suggesting that RNNs might be a viable option even in the era of Transformers.