
Sign up to save your podcasts
Or


The July 2019 paper introduces **RoBERTa**, a **robustly optimized BERT pretraining approach**, which is a refined version of the original BERT model. The authors conduct a **replication study of BERT** pretraining to assess the impact of various hyperparameters, finding that BERT was significantly **undertrained** and could be improved by simple modifications like **training longer with bigger batches, removing the Next Sentence Prediction objective**, and using **dynamic masking**. RoBERTa, built upon these changes and **trained on a larger dataset** including the novel **CC-NEWS** corpus, achieves **state-of-the-art results** on major natural language understanding benchmarks like **GLUE, RACE, and SQuAD**. The findings emphasize that **design choices and training duration** are highly significant and question whether recent performance gains in post-BERT models are due more to these factors than to architectural or objective changes.
Source:
https://arxiv.org/pdf/1907.11692
By mcgrofThe July 2019 paper introduces **RoBERTa**, a **robustly optimized BERT pretraining approach**, which is a refined version of the original BERT model. The authors conduct a **replication study of BERT** pretraining to assess the impact of various hyperparameters, finding that BERT was significantly **undertrained** and could be improved by simple modifications like **training longer with bigger batches, removing the Next Sentence Prediction objective**, and using **dynamic masking**. RoBERTa, built upon these changes and **trained on a larger dataset** including the novel **CC-NEWS** corpus, achieves **state-of-the-art results** on major natural language understanding benchmarks like **GLUE, RACE, and SQuAD**. The findings emphasize that **design choices and training duration** are highly significant and question whether recent performance gains in post-BERT models are due more to these factors than to architectural or objective changes.
Source:
https://arxiv.org/pdf/1907.11692