November 29, 2024

Computational Bottlenecks of Training Small-scale Large Language Models

17 minutes

This research paper investigates the computational efficiency of training small-scale large language models (SLMs), focusing on models with up to 2 billion parameters. The authors explore the impact of various hyperparameters and hardware configurations, including GPU type, batch size, and communication protocols, on training cost and speed. They utilize metrics like "loss per dollar" and "tokens per second" to optimize training efficiency on cloud services. Their findings offer practical recommendations for choosing cost-effective hardware and training strategies for SLMs, emphasizing the importance of FlashAttention for smaller models and Distributed Data Parallel (DDP) for improved efficiency. The study ultimately aims to facilitate wider adoption of SLM training in resource-constrained environments.

https://arxiv.org/pdf/2410.19456

...more

View all episodes

By AIPPD

November 29, 2024

Computational Bottlenecks of Training Small-scale Large Language Models

17 minutes

https://arxiv.org/pdf/2410.19456

...more

Share Computational Bottlenecks of Training Small-scale Large Language Models

Sign up to save your podcasts

Computational Bottlenecks of Training Small-scale Large Language Models

Computational Bottlenecks of Training Small-scale Large Language Models