
Sign up to save your podcasts
Or
This research paper investigates the impact of precision in training and inference on the performance of language models. The authors demonstrate that training with lower precision reduces the effective parameter count of a model and can lead to a trade-off between model size and precision. They find that post-training quantization, a common technique to reduce inference costs, becomes increasingly harmful to performance as models are trained on more data. Moreover, they develop a unified scaling law that predicts the degradation caused by post-training quantization and suggests that training larger models in lower precision can be more compute-optimal. The study utilizes over 465 pretraining runs and validates their predictions on models with up to 1.7 billion parameters trained on up to 26 billion tokens, highlighting the impact of precision on the scaling of language models.
https://arxiv.org/pdf/2411.04330
This research paper investigates the impact of precision in training and inference on the performance of language models. The authors demonstrate that training with lower precision reduces the effective parameter count of a model and can lead to a trade-off between model size and precision. They find that post-training quantization, a common technique to reduce inference costs, becomes increasingly harmful to performance as models are trained on more data. Moreover, they develop a unified scaling law that predicts the degradation caused by post-training quantization and suggests that training larger models in lower precision can be more compute-optimal. The study utilizes over 465 pretraining runs and validates their predictions on models with up to 1.7 billion parameters trained on up to 26 billion tokens, highlighting the impact of precision on the scaling of language models.
https://arxiv.org/pdf/2411.04330