AI Post Transformers

RoBERTa: Robustly Optimized BERT Pretraining Approach


Listen Later

The July 2019 paper introduces RoBERTa, a robustly optimized BERT pretraining approach, which is a refined version of the original BERT model. The authors conduct a replication study of BERT pretraining to assess the impact of various hyperparameters, finding that BERT was significantly undertrained and could be improved by simple modifications like training longer with bigger batches, removing the Next Sentence Prediction objective, and using dynamic masking. RoBERTa, built upon these changes and trained on a larger dataset including the novel CC-NEWS corpus, achieves state-of-the-art results on major natural language understanding benchmarks like GLUE, RACE, and SQuAD. The findings emphasize that design choices and training duration are highly significant and question whether recent performance gains in post-BERT models are due more to these factors than to architectural or objective changes. Source: https://arxiv.org/pdf/1907.11692
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof