Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, pr... more
FAQs about AI Post Transformers:How many episodes does AI Post Transformers have?The podcast currently has 572 episodes available.
August 07, 2025Scaling LawsThis 2000 paper, titled "Scaling Laws for Neural Language Models," explores the empirical relationships between the performance of neural language models (specifically Transformers) and various scaling factors: model size (parameters), dataset size (tokens), and computational budget (compute used for training). The authors demonstrate that model performance follows predictable power-law scalings across a wide range, often spanning multiple orders of magnitude. A key finding is that larger models are more sample-efficient, meaning they can achieve similar performance with less data and fewer training steps, suggesting that optimal compute-efficient training involves very large models that are stopped before full convergence. The research also notes that architectural details beyond these core scaling factors have minimal impact on performance....more19minPlay
August 07, 2025Transformer ScalingThis research paper explores the scaling behavior of Transformer architectures, offering insights into pre-training and fine-tuning efficiency. It challenges previous findings by demonstrating that model shape, not just size, significantly impacts downstream task performance, unlike its lesser effect on upstream pre-training loss. The study also reveals that scaling protocols vary in effectiveness across different computational regions, implying that strategies optimized for smaller models may not translate to larger ones. The authors propose a "DeepNarrow" scaling strategy that prioritizes increasing model depth, leading to models with fewer parameters and faster training times while maintaining or improving performance compared to conventional configurations. These findings and over 100 pre-trained checkpoints are openly released to facilitate further research into efficient Transformer scaling....more12minPlay
FAQs about AI Post Transformers:How many episodes does AI Post Transformers have?The podcast currently has 572 episodes available.