
Sign up to save your podcasts
Or


Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code
By Researchers across the Microsoft research community4.8
8080 ratings
Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code

145 Listeners

112,952 Listeners

200 Listeners

201 Listeners

821 Listeners

309 Listeners

507 Listeners

15,823 Listeners

140 Listeners

51 Listeners

99 Listeners