Sign up to save your podcastsEmail addressPasswordRegisterOrContinue with GoogleAlready have an account? Log in here.
FAQs about AI Podcast:How many episodes does AI Podcast have?The podcast currently has 427 episodes available.
January 05, 2025ConvNeXt: A Modern ConvNet for the 2020sA podcast discussing the architecture and performance of ConvNeXt, a modern ConvNet model that challenges the dominance of Vision Transformers....more7minPlay
January 05, 2025AI Vision Podcast: Masked Autoencoders for Scalable Vision LearningA deep dive into Masked Autoencoders (MAE) and their impact on computer vision, discussing their architecture, training efficiency, and performance on ImageNet and downstream tasks....more6minPlay
January 04, 2025AI Radio FM - Technology Channel, Your Personal Generative AI PodcastA podcast discussing the auxiliary-loss-free load balancing strategy for mixture-of-experts models....more6minPlay
January 04, 2025混合专家模型(MoE)技术综述本播客深入探讨了混合专家模型(MoE)的最新进展、算法设计、系统实现以及实际应用。从稀疏和密集MoE的背景知识开始,我们提出了一个创新的MoE分类法,并探讨了选通函数、专家网络、训练方案和系统设计方面的复杂性,从而全面了解MoE。...more6minPlay
January 04, 2025零气泡流水线并行本期播客深入探讨了零气泡流水线并行技术,这是一种旨在提高大规模分布式训练效率的创新方法。我们分析了传统流水线并行方法中的气泡问题,并介绍了如何通过精细化调度和优化器同步绕过技术来实现零气泡。此外,我们还讨论了自动调度算法、内存优化策略以及实验结果,旨在为听众提供一个全面而深入的技术解析。...more7minPlay
January 04, 2025GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingA podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its application to multilingual machine translation....more7minPlay
January 04, 2025AI Radio FM - Technology Channel: GShard and Giant ModelsA deep dive into GShard, a module for scaling giant neural networks, focusing on its application to multilingual machine translation and its impact on training efficiency and model quality....more9minPlay
January 04, 2025混合张量专家数据并行方法优化混合专家训练深入探讨 DeepSpeed-TED,一种新颖的三维混合并行框架,用于训练具有大型基础模型的混合专家模型。我们讨论了内存优化、通信优化以及与现有方法的性能比较。...more6minPlay
January 04, 2025统一序列并行方法:为长上下文生成式AI赋能本播客深入探讨了统一序列并行(Unified Sequence Parallelism,简称USP)方法,这是一种用于训练具有极长上下文的生成式AI模型的先进技术。我们分析了现有的序列并行方法,如DeepSpeed-Ulysses和Ring-Attention,并提出了一个统一的框架,该框架结合了两者的优点,同时克服了它们的局限性。通过详细讨论,我们将深入了解USP如何与数据并行、张量并行、ZeRO和流水线并行等现有并行技术相结合,从而为4D混合并行系统提供最佳实践。此外,我们还分享了实验结果,这些结果强调了USP在各种硬件配置下的性能,并展示了其在扩展模型上下文长度和提高训练效率方面的潜力。...more8minPlay
January 04, 2025LoongTrain: 高效长序列大语言模型训练本期播客深入探讨LoongTrain,一个为长序列大语言模型设计的高效训练框架。我们将讨论其核心的2D注意力机制,以及它如何结合头并行和上下文并行来克服扩展性限制并保持效率。此外,还将分析Double-Ring-Attention机制,以及设备放置策略对训练速度的影响。...more8minPlay
FAQs about AI Podcast:How many episodes does AI Podcast have?The podcast currently has 427 episodes available.