
Sign up to save your podcasts
Or


Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code
By Researchers across the Microsoft research community4.8
8080 ratings
Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code

341 Listeners

154 Listeners

213 Listeners

306 Listeners

90 Listeners

506 Listeners

478 Listeners

59 Listeners

131 Listeners

95 Listeners

123 Listeners

591 Listeners

26 Listeners

35 Listeners

136 Listeners