
Sign up to save your podcasts
Or
Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code
4.8
8080 ratings
Next-token prediction trains a language model on all tokens in a sequence. VP Weizhu Chen discusses his team’s 2024 NeurIPS paper on how distinguishing between useful and “noisy” tokens in pretraining can improve token efficiency and model performance.
Read the paper
Get the code
1,040 Listeners
481 Listeners
441 Listeners
298 Listeners
331 Listeners
127 Listeners
156 Listeners
192 Listeners
198 Listeners
88 Listeners
454 Listeners
259 Listeners
61 Listeners
75 Listeners
491 Listeners