AI Papers Podcast Daily

Computational Bottlenecks of Training Small-scale Large Language Models


Listen Later

This research paper investigates the computational efficiency of training small-scale large language models (SLMs), focusing on models with up to 2 billion parameters. The authors explore the impact of various hyperparameters and hardware configurations, including GPU type, batch size, and communication protocols, on training cost and speed. They utilize metrics like "loss per dollar" and "tokens per second" to optimize training efficiency on cloud services. Their findings offer practical recommendations for choosing cost-effective hardware and training strategies for SLMs, emphasizing the importance of FlashAttention for smaller models and Distributed Data Parallel (DDP) for improved efficiency. The study ultimately aims to facilitate wider adoption of SLM training in resource-constrained environments.

https://arxiv.org/pdf/2410.19456

...more
View all episodesView all episodes
Download on the App Store

AI Papers Podcast DailyBy AIPPD